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ABSTRACT 


This report presents the the results of the application of Avco Data 
Analysis and Prediction Techniques (ADAPT) to derivation of new 
algorithms for the prediction of future sunspot activity. The ADAPT 
derived algorithms show a factor of 2 to 3 reduction in the expected 
2-sigma errors in the estimates of the 81 -day running average 
of the Zurich sunspot numbers. The report presents: (1) The best 
estimates for sunspot cycles 20 and 21, (2) a comparison of the ADAPT 
performance with conventional techniques, and (3) specific approaches 
to further reduction in the errors of estimated sunspot activity and to 
recovery of earlier sunspot historical data. 

The ADAPT programs are used both to derive regression algorithm 
for prediction of the entire 1 1 -year sunspot cycle from the preceding 
two cycles and to derive extrapolation algorithms for extrapolating a 
given sunspot cycle based on any available portion of that cycle. It is 
suggested that further improvement in sunspot predictions is possible 
by including more data in the learning set, accounting for the present 
value of the sunspot number in the immediate future and for the extrap- 
olation algorithm, using a three cycle base instead of a single cycle 
base. 

The estimates obtained show that cycle 20 should last somewhat longer 
than previously anticipated, with the minimum of the 81 -day run- 
ning average occurring early in 1977. The estimates also show a lower 
peak activity for cycle 21 than previous estimates, with a maximum 
sunspot number of approximately 60 for cycle 21. 
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1.0 INTRODUCTION 


This report presents the results of a study which has the objective of develop- 
ing improved numerical techniques for predicting future sunspot numbers. 

The improved techniques are based on the use of the Avco Data Analysis and 
Prediction Techniques (ADAPT). ADAPT is a unique set of programs which 
first obtains the best representation for any given set of data. This best repre- 
sentation then allows one to characterize the data and to derive empirical 
prediction, classification, extrapolation and/or clustering laws in an extremely 
efficient manner. Previous applications of the ADAPT programs to reentry 
physics, sonar, engine diagnostics, medical, meteorological, and solar physics 
problems have demonstrated that the empirical prediction and extrapolation laws 
derived using the ADAPT programs have significant advantages relative to those 
developed by more classical techniques (see refs. 1 thru 7). The methods 
currently employed for estimating future sunspot activity are based primarily 
upon a classical empirical regression scheme developed by McNish and Lincoln 
(see ref. 8). Thus, the ADAPT techniques should provide significant improve- 
ment in the capability for estimating future sunspot activity. This report pre- 
sents the results of a study which demonstrates this improvement. 

The development of improved estimates of future sunspot activity is important 
to the study of solar physics in general and possibly to astrophysics. The 
estimates of solar activity have a very practical importance in that they are 
part of many geophysical models for predicting such quantities as satellite life 
times, cosmic ray intensity, atmospheric and climatic phenomena. 

The linear regression techniques developed by McNish and Lincoln have been 
modified by different investigators by using different lengths of past data for de- 
riving the predictions of future sunspot numbers. The current estimates for 
cycle 20 are based on the application of these techniques to the data obtained 
from cycles 1 thru 19- References 9 and 10 summarize these current predic- 
tions, which are reported monthly by solar activities indices memos such as 
References 11 and 12. Until the second quarters of 1972, this method was used 
by NASA/MSFC for the current estimates of sunspot cycle 21. Beginning in the 
second quarter of 1972 the NASA/MSFC estimates for cycle 21 have been based 
on a modification of this method introduced by Sleeper in Reference 13. This 
modification consists primarily of an analysis of the similarity of sunspot cycles 
which has produced classification of the sunspot cycles from cycles 1 thru 20 
according to their polarity and their mode. By limiting the selection of the cycles 
used in the linear regression forecast to that class of cycle which is being pre- 
dicted, Sleeper has been able to obtain improved predictions. 

In order to evaluate the advantages of the ADAPT prediction techniques, it is 
necessary to compare them with the currently-used techniques. For this pur- 
pose we shall call the application of the McNish and Iincoln linear regression 


1 


techniques to the first 19 sunspot cycles as described in Ref. 10 simple 
regression, and their application to a limited set of 9 negative cycles based 
upon the criteria outlined in Ref. 13 selective regression. In addition to the 
current simple regression and selective regression techniques, two separate 
ADAPT techniques are evaluated. The first of these is designated the ADAPT 
prediction technique, which refers to the algorithms developed for predicting 
the current sunspot cycle from the preceding two sunspot cycles. This pro- 
vides the capability to extend sunspot activity beyond the present cycle. For 
completing the present cycle, the ADAPT programs have been used in their 
extrapolation mode to develop extrapolations of the present cycle. This is 
referred to as the ADAPT extrapolation technique. Both of these ADAPT 
techniques have been used to predict the 81 day running average of the daily 
Zurich sunspot numbers. This introduces some minor difficulties in com- 
paring the ADAPT results with the simple and selective regression methods, 
since these latter methods have been used to estimate the twelve month run- 
ning average of the mean monthly Zurich sunspot numbers. 

This report will present the results of the studies carried out and the recom- 
mendations for the best method for estimating future solar activity, improve- 
ments which may still be made in the methods, and the application of the 
ADAPT techniques to further understanding of sunspot activity. The report 
also contains a description of the ADAPT programs and a detailed description 
of the efforts carried out to develop both the ADAPT predictions and extrapola 
tions. Additional applications of the ADAPT programs to analysis of sunspot 
data are also outlined. The performance of the ADAPT sunspot estimates are 
compared with the performance of the currently used techniques. 


2.0 RESULTS AND RECOMMENDATIONS 


The primary results of the application of the ADAPT techniques to the problem 
of predicting future sunspot activity is a factor of 2 to 3 reduction in the RMS 
error of the 1 -sigma estimate of the Zurich sunspot numbers for the remainder 
of the current sunspot cycle and the first half of the next sunspot cycle. To 
achieve this reduction in the error one must use both the ADAPT extrapolative 
and predictive algorithms. The present study, as well as comparison of the 
present study with that of Reference 13, provides evidence that further signi- 
ficant improvement in the ADAPT derived algorithms is almost certain if the 
analysis recommended in this report is carried out. 

The best ADAPT estimate of cycles 20 and 21 as compared to the latest available 
conventional estimate (Ref. 12) is presented in Figure 2. 1. In comparing the 
ADAPT estimates with the conventional estimates the reader must realize that 
the conventional estimates are for a 12-month running average while the ADAPT 
estimates are for an 81 -day running average. The effects of this difference are 
primarily that the 12-month running average reaches a given value approximately 
3 to 4 months after the 81 -day running average has reached that value. The 81- 
day running averages should have higher peaks and lower minimums than the 12- 
month running average. The ADAPT analysis indicates that the next minimum 
for the 81 -day running average will occur in February of 1977 which translates 
to May or June 1977 for the 12-month running average. Considering the three 
to four month correction which should be incorporated in the 12-month running 
average, ADAPT predicts generally higher values of the sunspot activity for the 
remainder of cycle 20 but approximately a 25% lower peak activity for cycle 21 . 
For a detailed comparison of these ADAPT predictions with both the prediction 
in Fig. 2. 1 and the latest predictions of Ref. 13, the reader is referred to 
Section 4.4. It is important to note, however, that the mid-1977 date for the next 
minimum and the greater sunspot activity during the remaining portion of cycle 
20 is in remarkable agreement with best estimate presented by Sleeper in Ref. 13. 
Although the maximum sunspot number of 60 predicted for cycle 21 by ADAPT is 
lower than the peak values of approximately 80 predicted by Sleeper in Ref. 13, 
it is in better agreement with the trends of peak magnitude for negative cycles 
and the maximum sunspot number versus period correlations in Ref. 13. 

Figure 2. 2 compares the performance of the two ADAPT methods developed in 
this study with the simple and selective regression techniques. This curve also 
compares all four of these techniques with the simple assumption that the sunspot 
cycle is equal to the mean of sunspot cycles 1 thru 19. Again the reader is cau- 
tioned that the simple and selective regression as well as the mean sunspot cycle 
shown in this figure are for 12-month averages whereas the errors for the ADAPT 
predictions are for the 81 -day running average. Figure 2. 2 plots the expected 
2-sigma error (i.e. 95% confidence limit) as a function of position as defined by 
number of months since start of cycle. Both the simple and selective regression 
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techniques have errors presented for long and near term estimates. The 
curves indicated by the circled numeral 1 are for the long term estimates. 
Here we define long term estimates as those estimates which use the available 
portion of the present cycle to predict the next cycle. The curves indicated 
by the circled numerals 2 and 3 designate the short term estimates using the 
simple and selective regression. The short term estimates are defined as 
estimates using the available portion of the present cycle to predict the re- 
mainder of the present cycle. Thus, in their functional use, the short term 
estimates correspond essentially to the ADAPT extrapolations and the long 
term estimates correspond to the ADAPT prediction. The ADAPT prediction 
performance is indicated by the solid line interrupted by circles. The solid 
lines interrupted by plus signs, crosses, and squares indicate the performance 
of the ADAPT extrapolations using 38, 76, and 93 months of the current cycle 
to extrapolate the remainder of the cycle. 

Although detailed discussion of the conclusions is presented in Section 4.4, 
we may summarize the conclusions reached from this study as follows: 

1. The best method for estimating future 81 -day running averages 
of the Zurich sunspot number is to use the ADAPT prediction 
algorithm for estimating sunspot activity in all cycles for which 
less than 70 to 80 months of the current cycle are available. 

For those cycles for which 70 to 80 months are available the 
ADAPT extrapolation should be used. A simple interpolation 
from the current value to the ADAPT extrapolated value for the 
period in the immediate future, i. e. the next approximately 

3 to 6 months will provide further improvement to this ADAPT 
estimate for the very near term. 

2. The ADAPT predictions are approximately a factor of 2 to 3 
better than the long term predictions based on either the simple 
or selective regressions over the first half of the sunspot cycle. 

The ADAPT predictions are similar to both the simple and 
selective predictions over the third quarter of the sunspot cycle 
and again the ADAPT predictions show a significant advantage 
for the end of the sunspot cycle. 

3. The ADAPT extrapolations based on the first quarter to the first 
half of the data in the sunspot cycle is similar to both the selective 
and simple regression methods except for the immediate future 
(i.e. 3 to 6 months) when the ADAPT techniques are inferior since 
they do not use the knowledge of the present value to correct the 
immediate future. The ADAPT extrapolations of the first quarter 
and first half also shows significant advantages near the end of 
the cycle. The ADAPT third quarter extrapolation is significantly 
better than any of the selective or simple regression techniques 
except in the immediate future which can be corrected as outlined 
in conclusion 1 above. 
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4. Approximately three quarters of the variation between sunspot 
cycles occurs in the first half of the cycle. ADAPT's ability to 
account for this is the major contribution to the improved accuracy 
of the ADAPT predictions over the first half of the cycle. 

5. The ADAPT prediction based on the preceding two cycles is 
better than the ADAPT extrapolation based on the first quarter 
of the cycle. 

6. Incorporation of the preceding 2 cycles in the ADAPT extrap- 
olation base will significantly improve ADAPT extrapolations 
using the first quarter and the first half of the cycle and may 
improve even the third quarter extrapolations. 

7. The ADAPT derived algorithms can be significantly improved 
by the addition of variables such as those outlined by Sleeper, 
including such items as the angular momentum of the solar 
system, the polarity, and position in the 180 year cycle of the 
cycles being predicted. 

8. For purposes of the prediction techniques currently available 
the 81 -day running average may be assumed to be identical 
to the three -month running average. 

9. After a period of three to five years, the simple regression 
techniques are equivalent to assuming that the predicted value 
is equal to the mean cycle of the cycles which are used to 
carry out the predictions. 

10. Sunspot cycle 19 is an anomalous cycle. 

The above conclusions are further confirmed by the summary of the RMS errors 
of various quantities which are presented in Table 2. 1. The first column of 
Table 2. 1 presents the RMS error of the 1 -sigma error in the estimate. The 
second column presents the RMS error of the estimates of all of the sunspot 
cycles in the learning data. The third column designated by 23. 4 an 

estimate of the quantity in the second column which can be performed relatively 
simply from the standard outputs of the ADAPT algorithms. The fourth and fifth 
columns present the RMS error for the predictions of cycles 19 and 20. There 
is considerable evidence presented in this report of the anomalous nature of cycle 
19, and therefore, it is not a good basis for evaluating the ability of the learning 
data to project the performance of an algorithm. The anomalous nature of this 
cycle was indicated by some of the ADAPT validity criteria. 

Analysis carried out in this report has shown that the ADAPT techniques should 
also be extremely useful for recovering earlier sunspot data, for predicting sun- 
spot cycle properties and for performing cluster analysis. The ADAPT scatter 
plots clearly show the separation into the mode 1 and mode 2 cycles as introduced 
by Sleeper in Reference 13. 
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The most important recommendations resulting from this study are as 
follows: 

1. Future estimates of the 81 -day running average of the Zurich 
sunspot number should be based on the techniques outlined in 
conclusion 1 above and updated on approximately a quarterly 
basis.. 

2. The present ADAPT algorithms should be immediately upgraded 
using the existing data and technology which has been suggested 
elsewhere in this report. 

3. Studies should be performed to use the ADAPT- techniques in 
conjunction with conventional approaches to recover additional 
sunspot cycles. 

4. After completion of the studies to recover additional sunspot 
cycles this information should be used to develop additional 
ADAPT algorithms for the long range forecasting of solar 
activity. 

5. ADAPT should be used for clustering studies of sunspot cycles. 
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** For Entire Available Cycle 



FIGURE 2.1 COMPARISON OF ADAPT AND CONVENTIONAL PREDICTION FOR 
CYCLES 20 AND 21 



MONTHS SINCE BEGINNING OF CYCLE MONTHS SINCE BEGINNING OF CYCLE 






MONTHS SINCE PREVIOUS MINIMUM 





3. 0 DESCRIPTION OF ADAPT 
3. 1 Definition of Data Histories 

The ADAPT techniques address themselves to the representation and empirical 
analysis of data which appear as data histories, i.e. , an indexed series of 
numbers. The features of ADAPT which make it advantageous for empirical 
analysis are reviewed in Appendix A. In the present case the indexing variable 
is time, in months. The histories may consist of numbers with different 
physical meaning; for example, quantities such as cycle type, mode, and/or 
position in the 180 year Jose cycle may be adjoined to the sunspot numbers. 

This was not done in the present study but offers an interesting method of 
incorporating additional information for the predictions. 

The histories may be given in continuous (analog) form or in discrete form; 
since the ADAPT programs operate in digital computers, analog histories are 
each digitized into a finite set of N numbers, so a history is treated as an N- 
dimensional vector in Euclidean space. If there are M histories, the result 
is an N x M matrix of numbers. 

The choice of the N numbers to represent each history is to some extent arbi- 
trary, the chief criterion being that the desired physical phenomena are properly 
contained in the N numbers. It may be desirable to perform some pre-processing 
on the given data to bring out these features before entering the ADAPT programs. 
Such pre-processing could include Fourier transforms, normalization, taking 
logarithms, etc. From a theoretical viewpoint, one could even use continuous 
data at this stage, since the first step in ADAPT (discussed below) produces a 
discrete output even when the input is continuous functions instead of vectors. 
However, the realities of numerical analysis on digital computers require that 
the input be in vector (digitized) form rather than functional (analog) form. 

3. 2 Optimal Representation of Data Histories 

With the M input history vectors defined, the first step in ADAPT is to construct 
from them an orthonormal set of base vectors by the classical Gram-Schmidt 
procedure. This ignores any history vectors linearly dependent on others, and 
results in a set of NC orthonormal N-component vectors where NC is less than 
or equal to the smaller of N and M. (The maximum number of linearly independent 
N-component vectors is N, so if M > N, some of the histories are surely linearly 
dependent on others. If M < N, then there are a maximum of M orthogonal base 
vectors. ) The data history vectors are now expressed in the Gram-Schmidt base 
by their components along the NC Gram -Schmidt vectors, so each history is given 
by NC components, and there are M x NC components altogether. 

This step has accomplished the task of discretizing the data, regardless of the 
form of the input or of its dimensionality. The Gram-Schmidt base vectors 
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have N components (or would be continuous functions if the input histories 
were functions instead of vectors) but the representation of the histories by 
their components in this base is independent of N, depending only on NC. 
Therefore, the Gram -Schmidt representation is largely independent of the 
particular way the data histories were digitized, assuming only that the 
numbers chosen to represent the histories properly contain their important 
features. In addition, there is usually a reduction in the number of numbers 
at this stage, since the M x N original components have been reduced to 
M x NC. 

However, there is no reason to believe that the Gram-Schmidt base is the 
best one for representing the data. It is really an arbitrary orthonormal 
set of base vectors determined solely by the order in which the histories 
are arranged. The next step is to find another orthonormal base which is 
in some sense the best for the given data as a whole. * 

To achieve this, a new set of NC N -dimensional orthonormal vectors, rotated 
from the Gram-Schmidt set, is postulated. This set is to be chosen in an 
ordered fashion, so that the first vector is the best, and so on. Only a limited 
number, NR < NC, of these vectors will be used as new base vectors for 
representing the histories. They are chosen as follows: Each history is 
represented by its coefficients in the Gram-Schmidt base, and is projected 
onto the NR new vectors, giving M x NR components in the new base. If 
there were as many new vectors as Gram-Schmidt vectors, NR = NC, this 
would be an exact representation of the history vectors, but since NR< NC, 
it is only approximate, leaving an error vector as the difference between the 
history vector and its representation in the new vector base. The square 
magnitude of this error vector is a measure of the error for each history, 
and the average of these square magnitudes for all histories is the mean square 
error incurred by representing the history vectors in only NR new base vectors. 

f 

The new orthonormal set of vectors is chosen by minimizing this mean square 
error, thus defining the meaning of a "best" set of vectors. If only one vector 
is used, NR - 1, it is that vector which makes the one -vector representation 
error the smallest. If a second vector is used also, it is chosen so that together 


^The approach taken is analogous to the expansion of functions in a set of 
orthonormal functions, of which Fourier series is the most common example. 
When one of the classical boundary value problems of mathematical physics 
is solved, the appropriate differential equation defines a set of orthonormal 
functions. To satisfy a given function on the boundary, this boundary function 
is expanded in this set of orthonormal functions. In the present case, there 
is no differential equation to define a particular set of orthonormal functions. 
However, it is possible to make this data define its own best set of such 
functions, or vectors. 
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with the first vector, it minimizes the two-vector representation error. 

This is continued for as many vectors, i. e. , as large a value of NR < NC, 
as is necessary or desirable. 

When formulated mathematically, this criterion requires the maximization 
of a quadratic form whose unknowns are the Gram -Schmidt components of 
one of the "best" base vectors, and whose coefficient matrix is the covariance 
matrix of the Gram-Schmidt components of the input histories. This problem 
is a classical one in linear algebra, which often appears under the name 
Karhunen-Loeve Expansion or principal components analysis of a matrix. * 

The solutions for the unknown vector components are the normalized eigen- 
vectors of the covariance matrix, and the resulting values of the quadratic 
form are the eigenvalues of this matrix. Once they are obtained, they are 
simply arranged in order of decreasing size of the eigenvalues. The largest 
eigenvalue gives the most reduction in mean square error that can be achieved 
with only one new base vector and the corresponding eigenvector is this new 
base vector. The next largest eigenvalue gives the most reduction in the error 
that can be achieved by using a second new base vector in addition to the first 
one found above, and this second vector is the eigenvector of this second 
largest eigenvalue. This process can be continued until the desired accuracy 
is achieved. The sum of the NR largest eigenvalues gives the maximum mean 
square error reduction which can be achieved with NR new base vectors; when 
adding additional eigenvalues does not significantly increase this sum, the use 
of the corresponding eigenvectors as additional base vectors does not significantly 
improve the representation. 

A convenient measure of the degree of representation achieved with a given 
number of base vectors is the sum of the eigenvalues of the vectors used, 
divided by the average square magnitude of the original data history vectors. 

This represents the reduction in mean square error achieved divided by the 
total error reduction possible; in statistical terms this is the percent of the 
variation of the data explained by the representation used. Since information 
is only conveyed by the variation in the data and the variation has the form of 
an energy, the percent variation explained is also known as the information 
energy. A similar measure of representation which is applied to the individual 
data vectors is the ratio of the square magnitude of the data vector in the NR 
base vector system to the original square magnitude of the data vector. This 
provides a measure of the adequacy of the empirically derived base for repre- 
senting each history, and when applied to a test history serves as the basis for 


*For a detailed discussion of the Kahunen-Loeve Expansion and its advantages 
in empirical data analysis see: S. Watanabe, "Karhunen-Loeve Expansion 

and Factor Analysis Theoretical Remarks and Application", Transactions of 
the Fourth Prague Conference Information Theory, Statistical Decision Functions 
and Random Processes, 1965, pp. 635-660. 


12 



the apriori test of the validity of applying the empirical data analysis to the 
test case. 

For each history the NR components in the optimal system are the optimal 
representation of the data in the sense described above. Alternatively, these 
components may be interpreted as coefficients of the Fourier series of optimal 
orthonormal functions representing the history. 

The optimal components are used in all further empirical analysis. Thus, 
the original M x N numbers representing M histories have been reduced to 
M x NR components, plus N x NR numbers to define the optimal vector base. 

Since the base system is optimal, the number of terms, NR, necessary to 
give a useful representation of history is small, often of the order of 10 or 
less, and the reduction in the number of numbers is usually large. 

In the process described so far, the optimal vectors are represented by their 
NC components in the Gram -Schmidt base, but this means they are a linear 
combination of the NC Gram-Schmidt vectors, the coefficients being these NC 
components. Since the Gram-Schmidt vectors are N-dimensional vectors, the 
optimal vectors can also be represented in the original N-dimensional space 
of the data history vectors by performing the linear combination. 

The ADAPT representation process just outlined can be clarified with the 
simple example of two input histories, which has been carried through analy- 
tically in Appendix B. For this special case the first optimal function is 
proportional to the average of the two history functions, the second to their 
difference, a result in accord with simple intuition. The relative sizes of the 
two eigenvalues is found to depend on the degree of correlation of the two 
histories, which has implications discussed later. 

3. 3 Sunspot Estimates and Analysis Using Optimal Representation 

Having arrived at the optimal (Karhunen-Loeve) representation, attention is 
now turned to use of the optimal components for generating empirical algorithms 
to perform the parameter prediction and extrapolation required for this study. 

It should be noted that this optimal representation is also well suited for empirical 
clustering analysis, classification and clutter subtraction. For clustering analysis, 
one represents each history by a point in optimal coordinates, and the degree of 
similarity of two histories can be defined as the distance between the two points. 

If the optimal representations are normalized, this distance is simply related to 
the correlation of the two histories. Thus, the application of visual, nearest 
neighbor, or other cluster identification schemes to points (i.e. data histories) 
of the optimal space will lead to identification of natural clusters and algorithms 
to identify their members. 
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For classification (including the special classification problem of detection) 
the same representation of a history as a point in optimal coordinates is used. 

A number of parametric schemes and linear non-parametric schemes which 
can be applied are included in the ADAPT programs. They may be extended 
to multi-class problems by repetitive application, separating a different class 
with each application. If the statistics of the learning data are Gaussian the 
maximum likelihood technique, which is included as an option in ADAPT, 
may be used for multi-class classification problems. 

Clutter subtraction is the unique capability of the ADAPT programs to sub- 
tract out characteristics associated with identifiable phenomena in the process 
of constructing the optimal space. The phenomena to be omitted from the 
optimal space is first characterized and then the directions associated with 
this phenomena are given a low or zero weighting in constructing the optimal 
space. The resulting optimal space or functions can then be used to recon- 
struct histories which will not contain characterizeable portions of the signa- 
ture due to this phenomena. 

The two types of empirical analysis which will be used in conjunction with the 
ADAPT representation to perform the prediction of future sunspot cycles and 
to extrapolate the present sunspot cycles are the use of parameter estimation 
and extrapolation techniques respectively. The mathematical basis for these 
operations will now be discussed. 

The ADAPT technique for constructing an algorithm to predict a physical param- 
eter associated with each history again makes use of the components of each 
history in the optimal system. For every history in the learning data, the known 
value of the parameter is written as a linear combination of the optimal com- 
ponents. The unknowns are the coefficients in this linear combination, which 
are taken to be the same for every history. The sum, over all histories, of 
the square error of this linear representation is then minimized to determine 
the coefficients. This amounts to a regression of the parameter on the optimal 
components. When the coefficients are found, they can then be used with optimal 
components of any new history to obtain an estimate of the value of the parameter 
for that history. 


Parameter estimation may also be used to predict data histories rather than 
single parameters. The approach is to utilize the ADAPT parameter estima- 
tion programs to predict the components of the ADAPT representation of the 
history to be predicted. Thus, the ADAPT representation plays two roles in 
this type of analysis. The first role is to define the optimal coordinate system 
in which to represent the history to predict, so that the number of components 
which must be predicted is minimized. The second role of the representation 
is the usual ADAPT role of representing the data histories used as predictors. 
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to minimize the dimensionality of the space in which the regression is to 
be carried out. For the case of predicting future sunspot cycles both 
of these representations could be the same if a single sunspot cycle 

were used to predict the next sunspot cycle. However, for the present study 
it was decided, after consultation with NASA, that the preceding two sunspot 
cycles should be used to predict any given sunspot cycle. Thus, the components 
to be predicted are the components of the optimal representation for a single 
sunspot cycle. The data histories to be used to make this prediction, which 
must first be represented in the optimal coordinate system prior to carrying 
out the regression analysis, are a set of two adjacent sunspot cycles. Regres- 
sion is then performed to relate the components of a single sunspot cycle to the 
components of the preceding two sunspot cycles. Once the components of the 
sunspot cycle have been predicted from this regression equation, they may be 
used in the Fourier series representation of the sunspot cycle, which provides 
the prediction of the sunspot number as a function of data. 

It is not necessary to actually find the optimal coefficients of a new history 
which is being investigated to apply an ADAPT derived algorithm. The trans- 
formation from the N-dimensional data vector space to the NR -dimensional 
optimal vector space can be inverted and incorporated into the algorithm vectors. 
Then the process of applying this algorithm to a new data vector involves pri- 
marily the dot product or combination of dot products of this N-dimensional data 
vector with an N-dimensional algorithm vector or vectors, a rather simple 
procedure. Thus, the algorithm for predicting the coefficients of the next cycle 
can be expressed as a dot product of the end of the month values of the 81 -day 
running averages of the daily Zurich sunspot number for the preceding two cycles 
with the ADAPT -derived relative importance vector. 

ADAPT offers a unique approach to extrapolating data histories. The entire 
learning data history, including the region over which one hopes to eventually 
extrapolate is used to find the optimal representation for the histories. One 
then determines the best components for the history to be extrapolated by 
making a least square fit of the available portion of the history to a generalized 
Fourier series using the part of the optimal orthogonal functions which cover 
the available portion of the history. These components are then used to re- 
construct the entire history from the complete optimal orthogonal functions. 
Clearly, the number of points in the known portion of the history must be 
greater than or equal to the number of components which are being estimated. 

In the special case where the number of points to be matched equals the number 
of components to be determined, an exact fit rather than a least square fit cap 
be found. A detail description of the ADAPT history extrapolation program 
which performs this extrapolation is given in Appendix C. 

3. 4 Evaluation of Performance and Validity of Estimates 

An objective of the ADAPT approach to empirical data analysis is to provide the 
analyst with information regarding both the performance and the validity of the 
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algorithms which he develops. Performance tells the analyst how good his 
algorithm is when it is applied to test data belonging to the same population as 
the learning data used to derive the algorithm. The validity criteria is a mea- 
sure of how well the test data belongs to the population of the learning data. 

Thus, the availability of performance data allows the analyst to select the best 
algorithm and to verify that the performance of the algorithm is sufficient to 
insure that it is based on physics and not merely a fortuitous mathematical 
manipulation of the data. The validity criteria provides the user a measure of 
the applicability of the algorithm to the particular case being tested. 

In the ADAPT programs the performance of regression algorithms is defined 
by the classical correlation coefficient and by the ratio, ( Q”rat) °f the standard 
deviation of the error to the standard deviation about the means of the learning 
data. In the sunspot study these latter measures for the regression analysis 
are used to decide which algorithms should be used in the prediction of the co- 
efficients of the sunspot cycle. 

Equations 1 thru 3 define for the estimation of each coefficient of a data 

history. 



3 = 1 


where Y- is the estimated coefficient, Y is the mean of the coefficients of 
of the learning data, Yj is the actual value of the coefficient of the ADAPT repre- 
sentation, and M is the number of cases used in the learning data. Note that this 
performance estimate, like all of the other performance estimates provided as 
part of the derivation of ADAPT algorithms, is based on the learning data. 
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The evaluation of the prediction of entire histories, such as the sunspot cycle, 
requires other methods of evaluating the performance of the prediction, rather 
than the correlation coefficients or RAT" O ne classical method of evaluating 
the performance of the prediction of a data history is to consider the two-sigma 
error bounds about the history. This measure of performance is particularly 
useful for understanding the performance of any particular estimate as a function 
of the indexing variable. However, for comparing a large number of algorithms 
or methods of predicting histories, the large number of numbers involved makes 
the use of this measure rather awkward. To overcome this, a single number 
which summarizes the performance over the entire history has been used in this 
study to compare various methods and algorithms in the initial phases. This 
measure is the RMS error between the estimated and actual data history. 

The calculation of the two-sigma band for a data history requires that one 
estimate the expected standard deviation between the estimated and actual 
data histories. This is an example of a problem for which the ADAPT formu- 
lation offers considerable simplification in the amount of caluclations required. 
Consider the task of evaluating the standard deviation between the estimated 
sunspot number, Rjt, and the actual sunspot number Rjt. Since ADAPT con- 
siders both the actual and estimated values as represented by Fourier series 
of optimal orthogonal functions the actual and estimated values can be expressed 
as: L. 

(4) 


(5) 


where L is the number of terms required to achieve 100% representation, Q is 
the number of terms utilized in the ADAPT representation used to estimate the 
sunspot number, the bar indicates the average value, the hat indicates the 
estimated value, the Y's are the coefficients of the generalized Fourier series 
utilizing the optimal orthogonal functions designated by Hj Note that the coef- 
ficients are independent of the indexing parameter t and that the optimal functions 
are the same for all histories (i.e., all j's). 

each 

value of time (in months) one may utilize equations 4 and 5 to write the standard 


Noting that the standard deviation, <r t . is given by 




Rjt = Rt + Z. 

i * 1 


Q 
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deviation of the estimate at time t as: 



Thus, Yj ^ is the difference between the actual and the estimated coefficient 
(i.e. the error in estimating the coefficient) and simply equal to the actual 
coefficients for Jl greater than Q. This follows from the fact that since only 
Q terms are being used in this analysis, this is equivalent to estimating zero 
for the coefficients of all of the terms beyond the Qth term in the series. 

The standard deviation is found at each point in the data history and thus one 
has one value of the standard deviation for each indexing value applicable to 
all histories. The RMS error between the estimated and actual history will be 
defined in such a way that one has only a single value for the entire history. 
However, one will now have an RMS error for every history. The RMS error 
for history j is defined by: 



Since there exists an RMS error for each history, one may define the expected 
average RMS error based on all of the histories in the learning data. This is 
the average of the RMS error when the algorithm is applied to the learning data, 
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and may simply be expressed as: 


E RMS ~ — 

M 



RMSj 


(9) 


The calculation of the RMS error for the special case where the estimate is the 
mean of the learning data is another example when the ADAPT formulation 
significantly simplifies the computation involved. For this special case equations 
4 and 8 combined with the orthogonal property of the optimal functions, H gives 
the following expression for the RMS error of the jth history: 


2 

§ ( 10 ) 


in terms of the coefficients of the optimal series, Yj jg . 

The RMS error between the nominal estimate and the^one sigma error, Ej^j^g 
can also be calculated by substituting 0 * t for. Rjt ~ Rjt in equation 8. Although 
this is not identical to the average RMS error, Erms> boll 1 ^RMS (T an< l ^RMS 
are estimates of the expected value of Ej^j^Sj f° r an Y given algorithms. 

The ADAPT programs also provide validity criteria which are based on the ability 
of the optimal functions derived from the learning data to represent the test data. 
These validity criteria are identical for and applicable to all ADAPT classification 
prediction and clustering algorithms. The validity criteria essentially makes use 
of the data vector's geometric property of length. The length of the learning data 
vectors may be calculated in the original data space and then compared with the 
new length when the learning data is represented in the optimal ADAPT space. 

A similar comparison can be made between the length of the test data vector as 
it is represented in the original data space and the optimal ADAPT space. If the 
test data vector's length is reduced significantly more than that of the learning data 
vectors when it is represented in the optimal space, this is indication that the test 
data is from a different population than the learning data used to develop that 
algorithm. Thus, it is not valid to apply the algorithm to that particular case. 

The validity criteria for the case of extrapolated data histories must be modified 
since the learning data is now identical to the first portion of the data histories 
or sunspot cycles and was not used to make the data base. However, the data 
which was used to make the base also contains the portion covering the identical 


'RMSj 



X = 1 
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range of the indexing variable as the learning portion of the data history to be 
extrapolated. One may then compute the RMS error, ErmSj'* f° r t ^ ie ^ rst 
portion of all the learning data histories. One may then take the average of 
this, finding the average RMS error, ErmS l, for all the learning data 
histories and also the standard deviation CT £ of these RMS errors. One may 
then compare the RMS error of the test case with the average and standard 
deviation of the RMS error for the corresponding region of the learning data 
and calculate the confidence in the validity of the extrapolation. For example, 
if the RMS error of the test cases falls outside of the range of the average RMS 
error for the learning data plus or minus its two-sigma value one has only 5% 
confidence that the extrapolation will be accurate to the degree indicated by 
the performance estimate based on the learning data. 

The next section of this report will present the detailed results of the repre- 
sentation, prediction and extrapolation of the sunspot cycles using the methods 
which have been outlined above. 
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4.0 ESTIMATE OF FUTURE SUNSPOT NUMBERS 

4. 1 Representation of Sunspot Data 

The objective of the predictions which are being investigated in this study is to 
provide an estimate of the 81 -day running average of the daily sunspot number. 

The daily sunspot numbers are available from 1890 or from cycle 13* to cycle 
20. Since NASA desired to reserve cycles 19 and 20 as test cases, this would 
leave a total of 6 cycles for learning data. This is believed to be insufficient 
learning data to provide good algorithms. On the other hand, monthly sunspot 
numbers are available from 1750 or cycle 1 and if one could use the monthly 
data as learning data instead of the 81 -day running average one would have 18 
learning cases instead of 6. 

If one formulates the 81 -day running average of the sunspot numbers and com- 
pares this with the 3-month running average of the sunspot numbers, one dis- 
covers that 81 components of the two averages are identical and usually only 
9 additional components are added to the monthly average. Thus, the total 
error made in using the three-month running average instead of the 81 -day 
running average is the order of 10% of the difference in the average of these 9 
components and the average of the 81 -day running average. Since the 9 compon- 
ents of the average are adjacent to the 81 -day running average, one would expect 
that their average would be quite similar to the 81 -day running average. Since 
the total expected error is only 10% of the small difference between these averages, 
one would expect a very small error in using the three-month running average as 
an approximation to the 81 -day running average. This is verified by Figure 4. 1 
which shows the difference between the 81 -day running average and the three - 
month running average evaluated every five months for cycle 19. The maximum 
difference observed is less than 3 sunspots. This difference is considerably 
smaller than the expected error in any of the estimates which will be discussed 
in this report. Thus, we conclude that the three -month running average is a 
completely adequate approximation to the 81 -day running average for the studies 
in this report and should be used since the factor of 3 gain in learning data signi- 
ficantly enhances the probability of success in this study. 

The first step in an ADAPT analysis is to find the optimum representation of the 
data histories which will be used to predict any given quantity. For the case of 
the sunspot estimates there are two data bases of interest. The first data base 
is made up of single eleven-year cycles. This data base is required for both the 
extrapolation and the prediction approaches to estimating future sunspot cycles. 

In both of these cases this base will be used in an unconventional manner when 
compared to the normal ADAPT procedures. In particular, for the prediction 


*The starting date for the cycles used in this study are given in Table 4. 1 
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of future sunspot cycles based on the preceding two sunspot cycles this base 
will be used to reduce the approximately 180 numbers required to define a 
sunspot cycle to two numbers. Thus, we will only have to make two prediction 
algorithms rather than 180 prediction algorithms to predict the sunspot cycle. 

The extrapolation of sunspot cycles only requires one base to carry out extrapo- 
lation as defined in Section 3. For this study the single cycle base will be used 
for the extrapolation. The double cycle base is the classical ADAPT base which 
will be used to predict the two numbers required to define the next sunspot cycle. 

Figures 4. 2 through 4.9 define the significant characteristics of the single cycle 
base. Figure 4. 2 is the average of sunspot cycles 1 through 18 which were used 
to make the single cycle base. This average cycle was first subtracted from 
each of the 18 cycles and then these data histories were processed through the 
ADAPT programs to find the optimum empirical orthogonal functions to represent 
the sunspot cycles. 

Figure 4. 3 shows the amount of information energy or the amount of variation 
from the average input vector which is explained by each of the terms in the 
optimum generalized Fourier series expansion of the sunspot cycles. The first 
term in the optimal series explains slightly more that 60% of the variation from 
the average cycle. The second term explains approximately 18% of the variation, 
and thus the first two terms explain nearly 80% (as illustrated by the upper curve 
in Figure 4. 3) of the variation in the sunspot cycles. Approximately another 10% 
of the variation is explained by the third through sixth terms of this optimal 
series. Examination of Figure 4. 3 indicates that remaining eleven terms in 
the expansion provide very little additional physical information. This follows 
from the term-by-term or lower curve in Figure 4. 3 which shows that from the 
seventh through the seventeenth term in the series the change in information 
energy as one goes from term to term is approximately equal. 

The first optimal function, representing about 60% of the variation from the 
average is shown in Figure 4.4. The most striking feature of this first optimal 
function is that the great majority of the variation explained by this function 
occurs in the first half of the cycle (i. e. , before month 76). This is extremely 
significant since more than half of the variation of the cycles from the average 
is explained by this function. This implies that a minimum of slightly more than 
a half of the entire variation from sunspot cycle to sunspot cycle occurs in the 
first half of the cycle. Moreover, this variation is the most highly correlated 
of all the variations occurring and is thus the most easily predicted. 

Examination of the second optimal function shown in Figure 4. 5 shows that this 
function, which explains the next 18% of the variation, provides approximately 
equal correction over the entire span of the sunspot cycle. Combining this with 
the conclusions from the first optimal function, one can estimate that between 
70 and 80% of the variation between sunspot cycles occurs during the first 7 6 
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month of the sunspot cycles. This implies that extrapolations of the sunspot 
cycles should be quite good if one uses the first 76 month as a basis for the 
extrapolation. It also implies that if one is going to predict the sunspot cycle 
from the preceding sunspot cycles, the prediction of only the first coefficient 
will result in a very accurate prediction for the first 76 month of the sunspot 
cycle but will yield a prediction which is little better than using the mean of 
all of the cycles for the last half of the sunspot cycle. Thus, one believes that 
the minimum number of terms which must be predicted to provide a significant 
improvement in the second half of the sunspot cycle is two terms of the optimal 
series. This is consistent with the results of Figure 4. 3, which shows that 80% 
of the variation will be explained by using these two terms. 

Examination of Figure 4. 3 indicates that the third through sixth terms, although 
making a relatively small contribution to the variation, might still contain some 
significant information. These four optimal functions are presented in Figures 
4. 6 through 4. 9 and have the general characteristic that they apply approximately 
equally throughout the entire sunspot cycle. They also have the characteristic 
that they define specific spikes in the sunspot cycle. This follows from the spikey 
nature of these optimal functions. The one exception to this is in the rear -most 
portion of the sunspot cycle from approximately month 100 through month 160, 
where optimal functions three, four and six each appear to make a uniform con- 
tribution to the last portion of the sunspot cycle. This indicates that these three 
optimal functions may be providing a correction to the length of the sunspot cycle. 

Figures 4. 10 through 4. 15 present similar information for the double cycle base. 
The Double cycle base is constructed using two adjacent eleven year cycles or 
approximately 22 year cycles of sunspot numbers. Just as in the case of the 
single cycle base one still desires to keep the nineteenth and twentieth cycles 
as test cases and thus the learning data for predictions of future cycles based 
on the preceding two cycles was limited to that data required to predict cycles 
three through eighteen. Since the two preceding cycles are not available for 
predicting cycles one and two, the number of learning cases available to make 
this base is reduced by two from the number of learning cycles available for 
producing the single cycle base. This base was constructed using cycles one-two; 
two-three, three four, ... up to cycles sixteen-seventeen. Note that the double 
cycle seventeen-eighteen, although available for use in the representation would 
not be available as a predictor in the learning data since this double base would 
be used to predict cycle nineteen which is being withheld as proof data. 

The average of these double cycles for cycles constructed from cycles 1 through 
17 in the method outlined above is presented in Figure 4. 10 and is the average 
input vector which was subtracted from each of the double cycle learning cases 
prior to constructing the optimal orthogonal functions for representing these 
histories. Since there are only 16 cycles in the learning data, a total of 15 optimal 
functions are sufficient to completely explain the variation in the data. The amount 
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of explained variation as a function of the number of optimal functions (i.e. ', 
number of dimensions in optimal space) used is presented in Figure 4. 11. 

Here we see that for the double cycle the first optimal function only explains 
about 37% of the variation. However, the second, third and fourth optimal 
functions explain 25, 12 and 9% of the variation, respectively, which is 
considerably greater than the variation explained by the corresponding functions 
in the single cycle base. 

Examination of Figure 4.11 shows that there are breaks or changes in the 
slopes of the explained variation occurring after the fourth, sixth, and twelfth 
optimal functions. Thus, the most likely groupings of interesting information 
consists of either the first through fourth optimal functions, the first through 
sixth optimal functions, or the first through twelfth optimal functions. Actually, 
it is quite likely that the last group, the seventh through twelfth optimal functions, 
explain peculiar characteristics of this set of learning data and will not be useful 
for analysis of sunspots. Thus, one could guess that the first six optimal functions 
would contain information which might be useful for predicting future sunspot 
cycles. We shall see in Section 4. 2 that this is indeed the case. Thus, let us 
consider the first six optimal functions in more detail. 


Comparison of the first optimal function for the double cycle base presented in 
Figure 4. 1 2 with the single cycle first optimal function presented in Figure 4. 4 
shows that the most highly correlated portion of the variation is still from the 
first portions of the two eleven-year cycles which combine to make up the double 
cycle. The second optimal function for the double cycle base shown in Figure 4. 13 
has the same characteristic and thus it is now taking two optimal functions to ex- 
plain the variation occurring in the first 76 month of the eleven year cycles. 

The third and fourth optimal functions presented in Figures 4. 14 through 4. 15 
provide relatively uniform corrections over the entire cycle and thus play a role 
similar to the second and third optimal functions of the one cycle base. Since 
the first two optimal functions explaining a total of 62% of the information deal 
with essentially the first 76 month of the eleven year cycle, we have a further 
confirmation of the result obtained from examination of the single cycle optimal 
functions that approximately 70 to 80% of the variation occurring over the eleven 
year sunspot cycle occurs in the first 76 month of the cycle. The fifth and sixth 
optimal function presented in Figures 4. 16 and 4.17 appear to make detail corrections 
to the oscillations of the sunspot cycle and possibly minor adjustments to the length 
of the sunspot cycle. 

Thus, we conclude that one should estimate at least two coefficients and no more 
than six coefficients of the optimal Fourier series representation of each of the 
cycles to be reconstructed. If these coefficients are to be estimated from the pre- 
ceding two cycles the preceding two cycles should be represented by at least three 
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and not more than six terms in the optimum generalized Fourier series repre- 
sentation. We have also seen that the three -month running average of the monthly 
averages are a reasonable approximation to the 81 -day running average. Thus, 
the remainder of Section 4 will explore the use of these two representations of 
the three -month running average to estimate future sunspot numbers either by 
extrapolation of the current cycle to its end or by predicting the future cycle 
from the preceding two cycles. 

4. 2 Predictions from the Preceding Two Cycles 

Prior to constructing the reference algorithm both the dimensionality and the 
type of regression must be selected. For the prediction of sunspot cycles there 
are two dimensionalities which must be considered. The first is the number of 
terms which will be used in reconstructing the new sunspot cycle. The analysis 
of the ADAPT single cycle representation presented in the preceding section has 
already indicated that the reconstruction should be based upon between two and 
six terms in the optimal series. The second dimensionality which must be con- 
sidered is that of the space in which the regression algorithm is derived. Again 
the analysis in the preceding section indicated that the dimensionality of this base 
should lie between 3 and 6. Thus, the remainder task is to select the best 
dimensionality within these ranges. 

There are two general types of regression algorithms available in the ADAPT 
programs. The first is a canonical regression which amounts to a simultaneous 
regression between the independent variables and all of the dependent variables 
which are to be predicted. In the present case the number of dependent variables 
to be predicted is equal to the number of terms which will be used to reconstruct 
the sunspot cycle. A classical multiple regression which fits each dependent 
variable separately to all of the independent variables individually is also available 
in the ADAPT programs. The advantages of the canonical regression are twofold. 
First a single processing derives the algorithms for all of the dependent variables, 
thus saving computer time and manpower in deriving the algorithms. The more 
important consideration is that the simultaneous fitting of all of the dependent 
parameters to the independent variables makes it more difficult for the mathematics 
to make a fortuitous fit to the data which will not be applicable to the test cases. 

The disadvantage of the canonical regression (i.e. , the advantage of the classical 
multiple regression) is that the canonical regression results in a slightly larger 
least square error between the estimated and actual values and thus does not pro- 
vide as small a value of as provided by the classical multiple regression 

technique. 

Since the canonical regression is less expensive to apply to a large number of 
dependent variables, the first step in further refining the estimate of the dimen- 
sionality to be used was to apply the canonical regression for several different 
dimensionalities. The results of this are included in Table 4. 2. Algorithms 
were derived using 4, 6 and 8 dimensions. In the case of the 4 dimensional 
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algorithm, 4 coefficients were predicted. In 6 dimensions 2 algorithms were 
derived, one for predicting 4 coefficients simultaneously and the other for 
predicting 6 coefficients simultaneously, and 6 coefficients were predicted in 
the 8 dimensional space. In each of these cases the performance of the algorithm 
for predicting each coefficient as measured by the standard deviation of the error 
in the estimate relative to the standard deviation of the input data about its mean, 
(TrAT’ an< * t * ie corre l at i° n coefficient, /^zVK are summarized in Table 4. 2 
The expected reduction in the standard deviation about the mean for the estimate 
of the entire sunspot cycle using the first few terms of the series is also pre- 
sented in Table 4. 2 and is designated by the quantity (j RAT- This quantity is 
simply calculated by summing the reduction in explained variation, E/ , times 
the value of ^RAT-^ ^ or eac ^ term used in the estimate and adding to this the 
amount of the explained variation which is not included in the estimate. Thus 
^RAT is given by: 



( 11 ) 


This quantity, cTrat’ multiplied by the RMS error between the mean cycle and 
the learning cycles should approximate the RMS error for the prediction using Q 
dimensions. Table 4. 2 also gives the performance of these canonical algorithms 
in predicting cycles 19 and 20. This performance is summarized as the root 
mean square error (see Section 3) between the estimated and actual values for 
these two cycles. 


The canonical results presented in Table 4. 2 lead to two conclusions. The first 
conclusion is that the prediction algorithms should be derived in the ADAPT 
optimum sixth dimensional space. Secondly, that algorithms derived from higher 
dimensional spaces will tend to be overdetermined; that is, a significant portion 
of the performance of the algorithm on the learning data is due to a fortuitous fit 
to the data and not the physics of the problem. The first of these conclusions is 
reached by rioting that the best predictions of the entire sunspot cycle as indicated 
by either rat or error for cycles 19 and 20 which have been circled 

in Table 4.2 all occur for algorithms derived in a 6 dimensional space. 

This conclusion is further enhanced by examination of the relative importance 
spectrum for predicting the 6 coefficients which are presented in Figures 4. 18 
through 4. 23. The relative importance spectrum is related to the spectrum in 
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classical Fourier analysis except that the trigonometric functions have now been 
replaced by the optimum empirical orthogonal functions and frequency no longer 
has a physical interpretation but merely is a number identifying the term in the 
generalized Fourier series. The relative importance spectrum tells the importance 
of each of the optimal dimensions, in this case 6, to the particular algorithm in 
question. Thus, examination of the relative importance spectrum presented in 
Figure 4.18 indicates that the most important direction for calculating the first 
coefficient is the fifth optimal direction and that the second and sixth optimal 
directions make significant contributions to this prediction. Thus, it is clear if 
one were to use less than 5 dimensions, there would be a significant increase in 
the error associated with the prediction of the first coefficient of the next sunspot 
cycle. Similarly, Figure 4.19 shows that the sixth dimension is dominant in 
predicting the second coefficient and that the error in the prediction of the second 
coefficient would be significantly increased if all six coefficients were not used. 

The same conclusion applies to the .prediction of the third coefficient as can be 
seen by examination of Figure 4. 20. In general, by examining Figures 4. 21 
through 4. 23 we see that the fifth and sixth optimal directions make significant 
contributions to all the predictions and thus should be retained. 

The on-set of the "overdetermined" condition as one moves from six optimal 
dimensions to eight optimal dimensions can be seen by noting that although the 
performance of the learning data as indicated by (XraT anc * ZVK 

significantly as one increases the dimensionality of the space in which the algo- 
rithm is derived from 6 to 8, the performance of the algorithms on the independent 
test cases (i. e, , cycles 19 and 20) decreases; that is, the RMS error is larger 
for the algorithm derived in the 8 dimensional space than in the 6 dimensional 
space. This is the characteristic of an overdetermined algorithm; namely, it 
has a significantly better performance on the learning than on independent test 
cases. 

Based on these results 6 dimensions of the optimal space were used to derive 
the prediction algorithm. For this reason the classical multiple regression 
algorithms were only applied in 6 dimensions. The results of the application 
of these algorithms are also shown on Table 4. 2 for the prediction of 4 coefficients 
in 6 dimensions. Note that since each coefficient is predicted independently, sets 
B and C are identical for the multiple regression (i.e., M.R.) algorithms. This 
is indicated by the X's in the performance regions of set C. Examination of the 
performance of the 6 dimensional algorithms for predicting the entire sunspot 
history reveals the interesting fact that the best performance for cycle 20 occurs 
when only 2 coefficients are predicted. The (7 rat a * so indicates that the greatest 
gam in prediction accuracy is achieved in the first two coefficients since the de- 
crease in this parameter as one goes from the second to third or third to fourth 
coefficients is quite small. Thus, we have an indication that one should use two 
coefficients for reconstructing the predicted sunspot cycle. The discussion of 
Table 4. 2 should have made clear the advantages of a single performance criteria 
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for measuring both the performance of the prediction algorithm and the degree 
of matching between the predicted and actual sunspot cycles which were claimed 
in Section 3. 

Table 4. 2 also provides a basis for selecting the type of regression to be used. 

Since the major advantage of the canonical regression is in reduction of the 
likelihood of the "overdetermined" condition and since the standard multiple 
regression algorithm always performs better on the learning data, the only 
justification for using the canonical algorithm i3 its performance on the test 
data. If it performs better, it is an indication that a significant portion, i.e. , 
sufficient to account for the difference between the multiple and canonical re- 
gression of the performance observed on the learning data is due to the "over- 
determined" nature of the algorithm. Examination of the performance of the 
multiple and canonical regressions shows that for this case this is not true. In 
fact for cycle 20 the multiple regression algorithm has significantly better per- 
formance than the canonical algorithm. For cycle 19 the canonical algorithm 
has slightly better performance than the multiple regression; however, as will 
be discussed in Section 4. 3, cycle 19 is an anomolous cycle and is probably not 
a valid cycle for making decisions as to the best way to construct the prediction 
algorithms. 

Thus, the prediction of the future sunspot cycle will be based on the use of the 
classical least square multiple regression applied in the first six dimensions 
of the ADAPT optimal space to predict the first two coefficients of the single 
cycle generalized Fourier series representation of the sunspot cycle. The 
accomplishment of this prediction may be divided into two parts: 1) the pre- 

diction of the first two coefficients of the sunspot cycle and 2) the reconstruction 
of the sunspot cycle using these first two coefficients. 

The relative importance spectrum for the algorithms recommended for predicting 
the first two coefficients of the next sunspot cycle is presented in Figures 4. 24 
and 4. 25 for the first and second coefficients respectively. These may be compared 
with the relative importance spectrum obtained for the corresponding coefficients 
using the canonical regression which were presented in Figures 4. 18 and 4. 19. 
Comparison of Figures 4. 18 and 4. 24 show that both types of regression give very 
similar relative importance spectra and therefore similar algorithms for predicting 
the first coefficient. Comparison of Figure 4. 19 and 4. 25 show that the sixth 
optimal direction is dominant for both the canonical and least square multiple 
regression algorithms for predicting the second coefficient. However, the canonical 
prediction made considerable more use of the first, third and fourth coefficients 
than the least square multiple regression algorithms. 

It is interesting to note that the prediction of the first coefficient is primarily 
based upon a term containing about 30 percent of the variation in the sunspot 
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cycles and has significant contribution from the first portion of each of the pre- 
ceding two cycles. On the other hand, the prediction of the second coefficient 
is based on only approximately 3 percent of the variation of the data and has a 
relatively uniform contribution from both the first and second halves of both the 
preceding cycles. These conclusions are reached by comparing the information 
energy and relative importance spectra with the corresponding optimal functions 
presented in Figures 4. 1 3 and 4. 1 7. 

Figures 4. 26 and 4. 27 present the relative importance vectors for these two 
algorithms. These relative importance vectors represent the vectors which 
when multiplied (dot product) by the sunspot numbers associated with the pre- 
ceding two cycles will yield a number equal to the coefficient for the next sun- 
spot cycle. Thus, the relative importance vector is the algorithm for predicting 
the coefficients of the next sunspot cycle, and as such also defines the importance 
of each portion of the sunspot cycle for predicting the next sunspot cycle. These 
same relative importance vectors are included in the tabulation of the algorithms 
which are presented in Table 4. 3. Table 4.3 has been constructed so that it may 
be used independent of this report to calculate the coefficients of the next sunspot 
Cycle. 

The second step in constructing the sunspot cycle consists of utilizing the pre- 
dicted coefficients in conjunction with the first and second optimal functions to 
reconstruct the sunspot cycles. The detailed procedure for this is outlined in 
Table 4. 4. Briefly, this procedure consists of taking the average sunspot cycle 
presented in Figure 4. 2 and adding to it, for each month the product of the first 
coefficient times the corresponding value of the first optimal function for that 
month plus the product of the second coefficient times the corresponding value 
of the second optimal function for that month. This procedure is carried out for 
each month in the cycle and the result will produce the predicted sunspot cycle 
history. This has been accomplished for the predictions of the learning data 
(cycles 3 through 18), the predictions of the proof test cycles (cycles 19 and 20), 
and for cycle 21. The resulting reconstructions for the. learning data are presented 
in Appendix D. We will now discuss the reconstruction of the proof test cases. 

Examination of Figure 4. 26 shows that the decision to use the two preceding 
cycles rather than just a single preceding cycle to predict the future sunspot 
cycle was a wise one. We see that the second preceding sunspot cycle has 
slightly more influence on the prediction of the first coefficient of the sunspot 
cycle than the immediately preceding sunspot cycle. In particular, the second 
half of the first of the two preceding sunspot cycles makes a significantly greater 
contribution to the prediction than the corresponding second half of the sunspot 
cycle immediately preceding that being predicted. One also can see that if the 
preceding two cycles decrease in amplitude the first coefficient will tend to be 
larger than if the preceding two cycles have increasing amplitude. Examination 
of Figure 4.4 shows that if the first coefficient is larger, the first portion of 
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the sunspot cycle will tend to have lower sunspot numbers than the mean cycle. 
Thus, one may make the general observation that if the first fifty months of 
the preceding sunspot cycle have lower sunspot numbers than the corresponding 
fifty months of a sunspot cycle, the next sunspot cycle will tend to have a 
relatively slow rise in sunspot numbers as compared to the mean cycle. 

Figures 4. 28 and 4. 29 measure the performance of each of these algorithms 
for predicting the coefficients of the learning data. The ordinate in these figures 
is the estimated value of the coefficient whereas the abscissa is the actual value 
of the coefficient; thus the solid line drawn on these: figures represents a perfect 
prediction. The dash lines have been placed on these figures to indicate the 
approximate bounds of the error in the coefficient which would yield an error 
in sunspot number of + 20. 

Since the sunspot cycles are being represented by two numbers, namely, the 
coefficients of the first and second terms in the generalized Fourier series 
representation of the sunspot cycle, it is possible to display the cycles on a 
two dimensional graph. Figure 4. 30 is such a display. This display is known 
as a scatter plot display and is simply a plot of the second coefficient of the 
optimal generalized Fourier series representation versus the first coefficient 
of this representation for each history. Thus, each sunspot cycle appears as a 
single point on this plot. The scatter plot presented in Figure 4. 30 is con- 
structed on the single cycle base and thus represents 80 percent of the variation 
or information contained in the sunspot cycles. We shall see later that this 
plot is very useful for studying groupings of sunspot cycles but it is also useful 
for comparing estimated and actual sunspot cycles. Figure 4.30 shows all of 
the actual sunspot cycle locations for cycles 1 through 20. These actual locations 
are indicated by the circles with the sunspot number shown inside the circle. 

The estimated position of the sunspot cycles is indicated at a sunspot cycle number 
enclosed in a square. If the estimated and actual sunspot cycle were to fall on 
the same place in this scatter plot that would indicate that the two term recon- 
structions would be identical. Since two terms of the optimal representation 
account for 80 percent of the variation in the sunspot cycles, it is a very good 
prediction. This scatter plot shows that the prediction of cycle 20 is rather 
typical of the predictions in the learning data. The prediction of cycle 19 is hot 
very typical of the accuracy of the predictions in the learning data. However, 
since the actual value of cycle 19 is far removed from any other cycle on this 
plot there is a strong indication that cycle 19 is anomalous. 

Table 4. 5 compares the RMS error of the learning data and the proof test cases 
with the average and standard deviation of this RMS error. This figure also 
shows the value of the ADAPT validity parameter, Q for the two cycles used for 
each of the predictions. This verifies that cycle 20 is an extremely good pre- 
diction. Examination of the representation criteria (Q) indicates that cycle 19 
has a relatively poor representation, namely . 74 as compared to an average 
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representation of .83 with a standard deviation about this of .13. This repre- 
sentation can be taken as an indication that one should exercise some caution 
in utilizing the prediction for cycle 19. 

The value of . 74 for the validity criteria is sufficiently low that it would not 
pass the more severe validity test of requiring that the representation be greater 
than the mean representation of the learning data. However, this severe criteria 
will limit the applicability of the predicted algorithm to a maximum of approxi- 
mately half of the cases to which it would be applied. More reasonable validity 
criteria for situations such as this where there is only one estimate upon which 
to make a decision is the mean minus 1 or 2 standard deviations. The validity 
criteria value of . 74 would pass either of these two less severe but more realistic 
representation requirements. It appears from this that in terms of the predictive 
algorithms one can probably have high confidence in those predictions which have 
a representation test or Q value greater than approximately 80 percent. For Q 
values less than 80 percent one must still use the predictions even if the confidence 
is lower since approximately half of the valid cases will have such a value but it 
is possible that invalid cases would also be in this region. Thus, some caution 
must be exercised when one observes a validity criteria below .8. 

Figure 4.31 shows the predicted sunspot number (solid lines) and two sigma 
bounds (dashed lines) on this prediction for cycle 19- This prediction is com- 
pared with the actual sunspot numbers (solid line) for cycle 19 in Figure 4. 32. 

We see that there is a great discrepancy between the actual and predicted sun- 
spot values for cycle 19, especially over the first 76 months of the sunspot cycle. 
This is entirely consistent with the scatter plot positions shown in Figure 4. 30 
for this cycle. The first optimal function presented in Figure 4.4 was completely 
dominated by the first 76 months of the cycle. Thus, a large error in the first 
coefficient of cycle 19 would result in an extremely large error in estimating 
the sunspot numbers over the first 76 months of the sunspot cycle. In particular, 
since the estimated value of the first coefficient is considerably larger than the 
actual value one would expect the prediction to significantly under predict the 
sunspot numbers for the first 76 months of the sunspot cycle. On the other hand, 
the prediction of the second coefficient is considerably better than the first coeffi- 
cient and one would expect that the second half of the sunspot cycle as well as the 
length of the period might be predicted considerably more reasonably. Examination 
of Figure 4. 32 shows this to be the case. In fact, since the minimum of the mean 
cycle is approximately five sunspots one should discontinue the predicted or dash 
curve in Figure 4. 32 when it crosses the five sunspot value. This occurs at 
approximately 136 months which compares with the actual period of 125 months 
or just slightly less than a year in error. Furthermore, the actual sunspot 
numbers from approximately the 80th through the 1 20th month are in good agree- 
ment with the prediction. Figure 4.33 compares the actual sunspot number 
(solid line) for sunspot cycle 19 with the two sigma bounds on the prediction. 

Again we see that cycle 19 is extremely anomalous and as we compare the ADAPT 
results with other results we shall see that cycle 19 is indeed an anomalous cycle 
which should be included in the base but which is not likely to reoccur for at least 
50 and possibly 150 or'more years. 
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Figure 4.34 presents the predicted (solid line) and two sigma bounds (dashed 
lines) the sunspot numbers for sunspot cycle 20. This prediction was made 
using cycles 18 and 19, and is more typical of the performance observed on the 
learning data as can be seen from Figure 4. 30. Examining Figure 4. 35 which 
compares the actuals to date (solid line) with the prediction (dashed lines), one 
sees that the prediction for cycle 20 is indeed quite good. Figure 4. 36 compares 
the actual values (solid lines) of cycle 20 with the two sigma bounds (dashed lines) 
on this prediction. We see that the two sigma bounds have been exceeded once 
at about 38 months after the beginning of the cycle. In evaluating the meaning of 
these two sigma bounds one must remember that the present predictions are made 
on a monthly basis and therefore for the typical cycle there are approximately 
132 opportunities to exceed these bounds. Since the two sigma bound is the 95 
percent confidence bound one would expect five to ten months during each typical 
cycle in which the actual values would exceed these two sigma bounds. Thus, 
the performance of cycle 20 tends to verify the validity of the two sigma bounds. 

The same algorithm was used to predict cycle 21 and the predicted values of 
the coefficients for cycle 21 are presented on the scatter plot in Figure 4. 30. 

The actual predictions for this cycle as well as the two sigma bounds about this 
prediction have been included in Figure 2. 1 and represent the best estimate for 
cycle 21 . This prediction will be discussed in more detail in Section 4. 4. 

4.3 Extrapolation of Sunspot Cycles 

The extrapolation of sunspot cycles will be carried out using the single cycle 
base in the manner outlined in Section 3. 3. As discussed in Section 4. 1, exam- 
ination of the single cycle base showed that one should use at least two and no 
more than six dimensions for the extrapolation of the history. To evaluate the 
effect of dimensionality on the performance of the extrapolations we shall use 
the parameter *|^rat a s defined in equation 11. Figure 4.37 presents this 
quantity for each of the three extrapolations which will be carried out in this 
section. The dash line in Figure 4. 37 is the result that would be obtained if 
the first term on the right hand side of equation 11 were zero. In other words, 
this is the result that one would obtain as a function of the number of terms 
used if the estimates of all of the coefficients obtained by the extrapolation were 
perfect. Actually, the estimates of the coefficients will have some error and 
as the number of terms increased one would expect that prediction to improve. 

Thus, the value of £Trat should decrease until the point is reached where the 
extrapolation procedure no longer reduces the error in the estimated coefficient. 

At this point the performance of the algorithm will degrade until the overdetermined 
characteristic sets in. When this occurs the curve will approach the dash line. 

This behavior is illustrated by the solid or third quarter extrapolation in Figure 
4. 37. The first and second quarter extrapolations have only been carried through 
to their first minimum since one should use the number of terms at which this 
minimum occurs for the extrapolation. 
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This study will consider three different extrapolations. The first will use 
the first 38 months of the cycle to extrapolate the entire cycle, which is 
designated as the first quarter extrapolation. The second quarter extrapola- 
tion uses approximately half of a typical cycle or the first 76 months. The 
third quarter extrapolation uses 93 months of the cycle to extrapolate the entire 
cycle. This time for the third quarter extrapolation was picked so that there 
would be sufficient data to extrapolate the 20th cycle using this approach. In 
order to withhold cycles 19 and 20 as proof test cases these cycles were not 
included in the single cycle base for the extrapolation. The locations of the 
first minima on the curves in Figure 4. 37 suggests one should use two terms for 
the second quarter extrapolation and four terms for the third quarter extrapola- 
tion. 

Extrapolations were formed for both the learning data, cycles 1 through 18, 
the proof test data, cycle 19 and the test data, cycle 20. The extrapolation 
for all of these cases is carried out in the same manner. The portion of the 
cycle to be used as the basis for extrapolation, i. e. , the first 39 months the 
first 76 months or the first 93 months, is substituted into the linear relation- 
ship between the coefficients to be predicted, the optimal orthogonal functions 
and the values of the sunspot number. One equation is obtained for each of the 
months for which data is available for extrapolation. Thus, we have 38, 76 and 
93 equations for determining the two, two, and four unknowns for the first 
quarter, second quarter and third quarter extrapolations, respectively. This 
over determined problem is solved by a standard least square fit procedure to 
determine the best coefficients to satisfy the entire set of equations. These 
coefficients are then assumed to be the correct coefficients for the entire cycle. 

As in the case of the predictions the performance of the prediction can be evaluated 
to a great extent by simply examining these coefficients. Again, the scatter plot 
is a convenient way to examine them. For the case of the first and second quarter 
extrapolations which are performed in two dimensions the scatter plot shown in 
Figure 4. 38 is a complete comparison of the estimated and actual values of the 
coefficients which will be used to predict this sunspot cycle. In the case of the 
third quarter extrapolation it is a comparison of the dominant information; however, 
two additional coefficients which are not shown on this figure will also be used in 
the prediction and may result in slightly different performance than would be 
obtained from the examination of Figure 4. 38. Examination of this figure shows 
that the 3rd quarter extrapolation is significantly better than either the first or 
second quarter extrapolation in agreement with Figure 2. 2. It is also interesting 
to note that extrapolation is the first prediction technique to give reasonably good 
performance for cycle 19. 

The performance of each of these extrapolations for each of the dimensions 
considered on each of the learning and test cycles is summarized in terms of 
the RMS errors between the estimated and actual cycles in Table 4. 6. The 
mean and the standard deviation of the RMS errors for the learning data are 


33 



also presented on this table. As discussed in Section 3.4 these values of the 
mean and standard deviation provide a validity criteria for the extrapolation 
for data histories. For example, if the RMS error of the extrapolated portion 
of a test history exceeds the mean of the RMS of the learning data plus twice 
the standard deviation of this RMS error of the learning data one knows that 
only 5 percent of the cases belonging to the population of the learning data could 
have values of the RMS error which were this large. Thus, it is quite reasonable 
to assume that this case is significantly different from the learning data and 
caution should be exercised when using this extrapolation. Examination of the 
RMS error for cycle 19 as compared to the means of the standard deviations 
shows that this 95 percent confidence level is exceeded. This then is a strong 
indication that cycle 19 is indeed an anomalous cycle. Thus, the ADAPT 
validity criteria does appear to work for the extrapolation. 

Figure 4. 39 through 4. 56 compare the extrapolated data histories with the 
actual histories, with the anticipated two sigma variation in the prediction. 

Figure 4. 39 presents the extrapolated sunspot cycle 19 and its two sigma 
bounds based on the extrapolation of the first 38 months of the cycle. Figure 
4. 40 compares the extrapolated history with the actual data history. Here we 
see the surprising result that despite the indication from the validity criteria 
that cycle 19 is an odd cycle, we have an unusually good estimate of this cycle 
when compared to other techniques. It must be pointed out that although this 
estimate is quite good compared to other techniques it is not nearly as good as 
the estimates which can be expected by this extrapolation for normal sunspot 
cycles. But even the first 38 months of cycle 19 provided sufficient information 
to allow significantly better prediction of this cycle than any other technique has 
been able to do by utilizing only data from preceding cycles. This appears to be 
an important attribute of the extrapolation technique, namely, that it has a better 
chance of accounting for the anomalous cycle than the prediction techniques. 

Figure 4.41 compares cycle 19 actual values with the estimated two sigma errors 
that would be expected from the 38 month extrapolation. One would expect only 
five to ten months during the sunspot cycle in which the two sigma bounds should 
be exceeded. Examination of Figure 4. 41 shows that the two sigma bounds are 
exceeded for approximately 20 months of sunspot cycle 19. Thus, the validity 
criteria indication that cycle 19 is anomalous and its extrapolation would be poorer 
than expected is verified. 

The first quarter extrapolation of the 20th cycle and its expected two sigma 
errors are presented in Figure 4. 42. This cycle is compared with the actual 
values in Figure 4. 43 and the predicted values are in good agreement with the 
actual values. When one compares the actual values with the expected two sigma 
errors in Figure 4. 44 one finds that the two sigma error is only exceeded two 
times during this history. Thus, cycle 20 appears to be a reasonable extrapolation 
based on just the first 38 months of the cycle. Figures 4. 45 through 4. 47 present 
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the same data for cycle 19 based on the extrapolation using the first 76 months 
of the cycle. Here we see, as would be expected from examination of the optimal 
functions, very good agreement between the estimated and the actual although we 
still see an unexpectedly large number of cases for which the actual values 
exceed the expected two sigma error bounds. Figures 4.48 through 4.50 provide 
the same information for the second quarter extrapolation of cycle 20 and the 
conclusions are similar to the first quarter extrapolation with the exception 
that the accuracy of the extrapolation has been somewhat improved. 

Figures 4.51 through 4. 53 present the results of the third quarter extrapolation 
of cycle 19. The results are very similar to the first and the second quarter 
extrapolations of this cycle with the exception that the error bands have been 
significantly reduced. Figures 4. 54 through 4. 56 present the third quarter 
extrapolation of cycle 20 and again the only significant difference between the 
third quarter extrapolation and the extrapolation using 76 or 38 months of the 
cycle is the reduction in the two sigma error. 

4. 4 Comparison of Predictions 

The preceding two sections have developed and presented the results of the 
two ADAPT approaches to predicting future sunspot numbers. Section 4. 2 
presented the ADAPT predictive approach which provides a capability to per- 
form long term predictions. Section 4. 3 presented the ADAPT extrapolation 
approach to completing the present cycle. The detailed results of the pre- 
dictions for cycles 19, 20 and 21 for these two methods have been given in 
those sections. In this section we shall compare the results of the ADAPT 
predictive and ADAPT extrapolative predictions with the simple and selective 
regression models which have been used for predicting the sunspot numbers. 

Comparison of Predicted Values 

Figure 2.1 presents the comparison of the latest available estimate (June 1972) 
with the ADAPT estimate for sunspot cycles 20 and 21. In examining this figure 
it must be realized that the conventional prediction is for a 12-month running 
average and evaluated quarterly, whereas the ADAPT predictions are for 81- 
day running averages and evaluated monthly. This difference has three major effects 
on the predictions: the first is that the ADAPT predictions, being based on 
shorter running averages and evaluated more often, tend to have more of the 
detailed oscillations retained than the longer 12-month running average. The 
second effect is that since the 12-month running average contains data from 
earlier times the 12-month running average will reach a given sunspot number 
later than the 81 -day running average. Third, the 12-month running average 
will tend to lower the peak values and raise the minimum values associated with 
the sunspot cycle. 
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Thus, we see there is considerable disagreement between the ADAPT methods 
and the current predictions of the sunspot cycles. The most significant of these 
is the disagreement in the time of the next minimum and therefore also the pre- 
diction of the time of the next maximum. Figure 2. 1 shows a minimum on the 
current predictions of the 12-month running average of June 1975 as compared 
to a minimum of February 1977 for the ADAPT predicted curve. However, 
realizing that the ADAPT curve based on an 81 -day running average will reach 
the minimum approximately 3 to 4 months earlier than the 12-month running 
average of the same data, February of 1977 is equivalent to April to June of 
1977 for the 12-month running average. Thus, we see that the ADAPT predic- 
tions indicate that the end of cycle 20 will occur approximately 1 1/2 to 2 years 
later than the current predictions. 

The ADAPT predictions presented here are a composite of the extrapolation for 
cycle 20 and the prediction for cycle 21. The extrapolation for cycle 20 is the 
best available extrapolation based on extrapolation the first 93 months of cycle 
20 to the end of cycle 20. Since extrapolation techniques are only suitable for 
completing the present cycle, the prediction of cycle 21 was based on the pre- 
dictive approach. The two predictions are attached together at the point where 
they each reach a value of approximately 5 for the sunspot number. This is 
based on the result that the means of the minimum of the 81 -day running average 
sunspot number for the first 18 cycles is approximately 5. 

The expected time difference between the 12-month running average and the 81- 
day running average implies that for the remainder of the present cycle one 
would expect the estimated of the 12 -month running average to remain higher than 
the estimate of the 81 -day running average. Thus, the fact that the 12-month 
running average lies slightly below the 81 -day running average in Figure 2. 1 is 
an indication that the difference between the present method of estimating sunspot 
numbers and the ADAPT extrapolation of cycle 20 is somewhat greater than would 
be indicated by Figure 2. 1. Similarly, for the beginning of cycle 21 one would 
expect the 12-month running average to lie underneath the 81 -day running average 
and therefore Figure 2.1 indicates considerable difference in the estimates for 
cycle 21 . However, the major portion of this difference is due to the difference 
in the predicted time of the next minimum, i. e. the start of cycle 21. Clearly, 
the approximately year and half to two years later start of cycle 21 predicted 
by ADAPT accounts for the major difference between the estimates of cycle 21 
based on ADAPT predictions and the conventional estimates. One other major 
difference in the estimates for cycle 21 is that the June projection for cycle 21 
indicates a maximum sunspot number for the 12-month running average of slightly 
over 80 sunspots; whereas, the ADAPT prediction for cycle 21 indicates a maximum 
of the order of 65 sunspots. This is particularly significant if one realizes and 
recalls that the 12-month running average should tend to have lower peaks than 
the 81 -day running average. 
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Since the June projection of the remainder of cycle 20 and cycle 21 sunspot 
numbers is a combination of the simple regression for the remainder of cycle 
20 and the results of Reference 13 for cycle 21, it does not represent a fair 
comparison between the results of Reference 13 and the ADAPT predictions 
for the sunspot numbers through cycle 21. Figure 4. 57 presents a figure 
similar to Figure 2. 1 which compares the best estimate presented in Reference 
13 (see Figure 4.67 of Reference 13) with the ADAPT predictions. This figure 
shows remarkable agreement between these two methods for the remainder of 
cycle 20. The estimate provided by Sleeper remains just slightly above the 
ADAPT 81-day running average for the remainder of cycle 20 which is exactly 
what would be expected based on the fact that the Sleeper prediction is for the 
12-month running average whereas the ADAPT prediction is for the 81 -day 
running average. Cycle 21 comparison between these two methods is identical 
to that in Figure 2. 1 corrected for the approximately year and half to two- 
year difference in start time for cycle 21. That is, the Sleeper prediction 
for cycle 20 yields the same start time for cycle 21 as does the ADAPT extrapo- 
lation on cycle 20. Thus, this presents a better comparison of the cycle 21 
predictions based on the selective regression proposed by Sleeper which utilizes 
only the 9 negative cycles to predict negative cycle 21. The. only significant 
difference between these two predictions is that the ADAPT predicts a lower 
peak activity for cycle 21 than does the selective regression method. 

Comparison of Expected Accuracy 

The most significant way to compare the expected accuracy of the various 
methods of predicting the sunspot cycles is to compare plots of their 95% 
confidence bounds. These plots comparing the four ADAPT predictions, simple 
and selective regression techniques based on 18 sunspot cycles are shown in 
Figure 2. 2. The mean value of the first 18 cycles is presented as a solid curve 
on this figure. The 3 dashed curves No. 1 , 2, 3 represent the results of the 
simple regression. The dash curve 1 represents the results of the simple re- 
gression for predicting the next cycle based on a portion of the current cycle. 

Dash (-) curves 2 and 3 represent the results of predicting the remainder of the 
present cycle starting at 51 and 84 months respectively. The curves consisting 
only of plus (+) signs, known hereafter as the plus curves, present similar 
results for the selective regression technique. Again, the plus curve identified 
by Number 1 is the prediction of the next cycle from a portion of the current 
cycle. The plus curve Number 2 shows the results for predicting the remainder 
of the current cycle starting at month 90. The results for the simple and selective 
regression have been taken from References 10 and 13. 

The solid lines interrupted by plus signs (+), crosses (x), squares ( □ ), and 
circles (O) present the 95% confidence bounds for the ADAPT predictions. The 
solid line interrupted with plusses represents the ADAPT extrapolation based 
on using only the first quarter of the sunspot cycle. The solid line interrupted 
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by crosses represents the ADAPT extrapolation utilizing the first half of the 
sunspot cycle and the solid line interrupted by squares indicates the ADAPT 
extrapolation based on the first three quarters of the sunspot cycle. These 
extrapolations are essentially based on the same information as the dash curves 
2 and 3 and plus curve 2. Comparison of these six curves shows that in general, 
with approximately half of the sunspot cycle available, the simple and selective 
regressions can be expected to give better estimates than the ADAPT extrapola- 
tions for periods of approximately 30 to 40 months. After that time the results 
of the two approaches become quite similar until near the end of the cycle, 
where the simple and selective regression approaches have difficulty associated 
with the large variation about the mean being introduced by the following cycle. 

Comparison of the solid line interrupted with squares with the dash line 3 and 
the plus line 2 indicates that when approximately 80 months of the cycle are 
available the selective and simple regressions only hold their advantage for 
a period somewhat under a year after which the ADAPT extrapolation proves 
considerably better for the remainder of the cycle. 

If one wishes to project from the present cycle to the next cycle the comparison 
of the solid line interrupted by circles with the dash line 1 and plus line 2 
indicates that the ADAPT prediction is significantly better for the first half 
of the cycle and approximately equal to the other methods during the greatest 
portion of the second half of the cycle with the exception of the very back portion 
of the cycle when the simple regression has difficulties associated with the large 
variation expected around the beginning of the next cycle. It is interesting that 
ADAPT prediction from the preceding two cycles performs as well or slightly 
better than either the first or second quarter ADAPT extrapolations. 

In the first approximately 70 to 80 months of the cycle both the selective and 
simple regression techniques have 95% error bounds significantly larger than 
the ADAPT methods. The reason for this is apparent if one recalls the shape 
of the first optimal function for representing the sunspot cycles. Figure 4.4 
showed that the first optimal function is almost entirely composed of information 
in the first 70 to 80 months of the sunspot cycle. Recalling that this explained 
approximately* 75% of the variation from the mean, it is clear that any technique 
which tends to compensate equally throughout the cycle will make much larger 
errors in this first 70 to 80 months due to the fact that this is where the greatest 
variation lies. 

The simple and selective regression techniques are essentially methods of 
utilizing the average or mean of the preceding cycles as a basis for extrapolating 
the present cycles. This can be seen by considering the regression curves 
started at later dates in the cycle as indicated for the simple regression by the 
dash curves starting with the numbers 2 and 3. The number 2 dash curve starts 
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at approximately month 51 and by approximately month 80 has reached the 
dash curve 1 which represents the results of extrapolating forward an entire 
cycle. Dash curve No. 3 starts at approximately month 85 and reaches this 
extrapolation of an entire cycle forward approximately at month 120. Similarly, 
a selective regression prediction indicated by the plus curve 2 starting at month 
90 also reaches the full cycle forward prediction based on the selective regres- 
sion indicated by the plus curve 1 at approximately 120 month.’ From this we 
conclude that the effect of the regression portion of the simple and selective 
regressions is to buy 30 to 40 months of improvement over simply assuming 
that the next sunspot number is the mean of all the preceding sunspot cycles. 

In other words, the simple and selective regressions amount to an extrapola- 
tion procedure to account for the additional information that the knowledge of 
the present position of the sunspot cycle provides the user. We conclude that 
we would achieve similar results to both the simple and selective regression 
by applying these regressions over no more than a three or four year period 
and at the end of this three to four year period simply assuming the remainder 
of the cycle is the mean cycle. 

Thus, it appears that the best methods currently available to predict future 
sunspot numbers are as follows: The prediction over the next three to six 

month period from any given time is presently best made using an interpola- 
tion between the predicted value and an extension of the present value using 
the. predicted variation. If one has more than approximately half of the present 
cycle available the prediction to the end of the present cycle can best be accom- 
plished by using the ADAPT extrapolation. Projection of the next cycle regardless 
of the position in the present cycle is best made by using the ADAPT prediction 
algorithms. 

The preceding results also provide strong indications of how the prediction of 
future sunspot numbers can be further significantly approved. The first major 
improvement which can be made to the ADAPT techniques is to incorporate some 
extrapolative capability to make use of the knowledge that the sunspot number in 
the immediate future is very strongly influenced by the present value. This is 
particularly true because of the fact that one is using running averages and it is 
not possible for radical changes in the sunspot number to occur in very short 
times. Thus, it is recommended that an interpolative procedure be added to the 
ADAPT extrapolations to better account for the current value of the sunspot 
number. 

The preceding 2 cycles contain extremely important information for the prediction 
of the sunspot cycle. On the other hand, the ADAPT extrapolations have shown 
that considerable advantage can be gained by utilizing information in the present 
sunspot cycle. Thus, it is recommended that the best procedure for estimating 
future sunspot numbers is to develop an ADAPT extrapolative procedure utilizing 
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the preceding 2 cycles plus the available portion of the present cycle. For 
long term estimates, the ADAPT prediction algorithms based on the preceding 
two cycles will give the best results. The fact that one has reasonably good 
estimates based on predicting the current cycle from the preceding two cycles 
suggests that this procedure should be good for a period of at least 22 years 
and probably significantly more. The next section of this report will summarize 
these recommendations as well as outline a program to implement it. 

Table 2. 1 presented a more compact summary of the errors of the various 
prediction techniques. The first column in this table presents the RMS error 
defined in Section 3 of the one sigma error band relative to the estimated 
value of the sunspot number. The second column presents the RMS error of 
the learning data when one assumes the estimate to be the mean of the learning 
data and for the ADAPT predictive techniques. An approximation to the RMS 
error is equal to the reduction in RMS error expected for particular algorithm 
times the RMS error achieved by using the mean of the sunspot cycles. This 
estimate of the RMS error is presented in column 3 which is headed 23.^7^^ -j.* 
This table also presents the RMS error observed for predictions of cycles 
19 and 20 utilizing each of the methods. 

The methods considered are simply taking the sunspot number as the mean 
of the corresponding point iri the learning data histories, the simple regression 
over the period of September 1971 to September 1983, and the selective regres- 
sion over the period of March 1972 to March 1984. These are compared with 
the four ADAPT estimates: 1) ADAPT predictions, 2) the ADAPT extrapolation 

over the first quarter cycle, 3) the ADAPT extrapolation using first and second 
quarters, and 4) the ADAPT extrapolation using the first three quarters. 

Table 2.1 provides further verification of Figure 2.2, i. e. , that considering 
the entire cycle one finds that the best prediction is made by the ADAPT 
extrapolation using the third quarter data, the next best is the second quarter 
extrapolation and the third best is either the ADAPT prediction or the ADAPT 
extrapolation of the first quarter. It is clear from the preceding discussion 
of Figure 2. 2 that these gross summaries do not give the entire story because 
there are regions in which some of the techniques which show up relatively 
poorly as a predictor of an entire cycle show certain significant advantages 
for a portion of the cycle. 

A more detailed examination of the performance of the ADAPT derived prediction 
on the learning data can be made by comparing the actual and predicted learning 
sunspot cycles. The information required to perform this comparison for the 
ADAPT prediction and third quarter extrapolation algorithm is presented in 
Appendix D. 


40 



4. 5 Recommendations 


The application of ADAPT to estimating future sunspot cycles which has been 
described in Section 4 leads to recommendations in two general areas. The 
first is that of defining how one should best make estimates of future sunspot 
numbers using the available algorithms. The second is the definition of analysis 
which should lead to significant improvement in the available algorithms for 
estimating future sunspot numbers. 

Sunspot Estimates Using Available Algorithms 

Based on the preceding analysis it is recommended that the prediction algorithm 
presented in Tables 4. 3 and 4.4 be used to predict all future sunspot cycles 
(i. e. ,' cycles beyond the current cycle) and for the first 75 months of the current 
cycle. For months 75 through the end of the current cycle the ADAPT extrapola- 
tion as described in Section 4. 3 and Appendix C should be used. In both cases, 
the immediate future, that is the next 3 to 6 months, can be improved by inter- 
polating between the predicted value and the value which would have been obtained 
by extending the current sunspot number utilizing the predicted variation for the 
next six months. 

The short term (i.e. six months) correction to the ADAPT predictions has been 
recommended to overcome the disadvantage which ADAPT has as a result of pre- 
dicting the entire sunspot cycle without insuring that the prediction actually goes 
through the most recent known values of the sunspots. It is believed that if the 
above recommendations are followed, predictions having approximately a factor 
of three improvement over any currently available can be achieved. It is also 
been shown that these prediction techniques offer an opportunity to provide rea- 
sonable predictions as much as two or more cycles in advance of the current sun- 
spot cycle. 

Improvement in Sunspot Prediction Techniques 


Although the ADAPT analysis to date has produced significant improvement in 
the ability to estimate future sunspot numbers, considerably greater improve- 
ment can still be achieved by making use of what has been learned from this 
study and from the studies outlined in Reference 13. The improvements in pre- 
diction capability can be expected in the two areas of methodology and an improved 
data base. 

The present study has shown that the two cycles preceding any given cycle con- 
tains significant information for predicting that cycle. Furthermore, the re- 
sults from cycle 19 showed that this information is not completely redundant 
with the information contained in the first part of a sunspot cycle. Thus, it fol- 
lows that one could significantly improve the extrapolations which have already 
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been quite successful using the ADAPT approach by using a 3 cycle rather 
than a 1 cycle base for the extrapolation. Both the extrapolation and the pre- 
dictive approaches can be improved over the present results by including a 
procedure to account for the fact that the present value for the sunspot cycle 
is known and in general different from the present value estimated by the ADAPT 
approach. It is also possible to make use of the fact that negative sunspot 
numbers are inadmissible. This is perhaps more important for the studies 
which will be disucssed in Section 5 but could make some additional contribu- 
tion to the accuracy of the estimates of future sunspot cycles. To incorporate 
this in the extrapolation algorithm requires the use of a nonlinear programming 
analysis in place of the least square fit for the extrapolation. 

In addition to improving the methods as outlined above, it is clear from the 
results from the present study and of the work in Reference 13, considerable 
additional information is available which can be used to improve sunspot pre- 
dictions. The first thing that should be included in the predictions is all of the 
available sunspot data. Avco believes that the present study has adequately 
demonstrated the advantages of the ADAPT approach to predict future sunspot 
cycles and any future applications should include all of the available data. Thus, 
it is proposed that as a minimum cycles 19 and 20 be included in the base for 
developing any further algorithms. In addition, Reference 13 has shown that 
there may be a high correlation between sunspot cycles and such quantities as 
the angular momentum of the solar system (dP/dt), the position of the sunspot 
cycle in the 180 year period, the polarity of the expected sunspot cycle and the 
mode classification of sunspot cycle. With the exception of the mode of the sun- 
spot cycles, all of these quantities are known prior to the beginning of the pre- 
diction task. They may therefore be included in the data vector used to predict 
the sunspot cycle. It should be noted that the angular momentum is a data history 
in itself and may be included in the same way as the preceding sunspot cycles. 

It is suggested that an annual measure of angular momentum of the solar system 
for the preceding two cycles as well as for the period over which the sunspot 
history is to be predicted should be included in the data history. The position in 
the 180 year sunspot cycle should be included in two different ways to account 
for possible nonlinear effects. The first is simply to assign a value to this 
variable equal to the number of months from the start of the most recent 180 
year period to the start of the present sunspot cycle. In addition to this, 16 
binary variables should be introduced which have the value of zero except for 
that variable corresponding to the position (i.e. number of cycles since begin- 
ning of the 180-year history) for the sunspot cycle being predicted. Inclusion 
of these additional variables in the derivation will probably lead to improved 
predictions, and at the very least provide a conclusive determination of the 
importance of these variables to the estimation of sunspot number. 

The present study has indicated that there is a reasonably high probability that 
application of the ADAPT techniques to the available data plus an analysis similar 
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to that carried out in Reference 14 can result in the addition of a signi- 
ficant number of sunspot cycles to the learning data. Justification and approach 
to accomplishing this will be presented in more detail in Section 5; however, 
to the degree to which this is successful more learning data will be available 
at the conclusion of such an effort. Clearly, this additional learning data 
should be included in the development of algorithms for predictions of future 
sunspot cycles. It may be possible to recover sufficient data to make the 
estimate of future 180 year cycles feasible. If this is the case, the prediction 
of the next 180 year cycle provides additional information which should be 
incorporated in the data history as learning data. 

Based on the preceding discussion, it is recommended that a two-phase pro- 
gram be implemented to upgrade the prediction of future sunspot cycles. The 
first phase would be aimed at immediately improving techniques for estimating 
sunspot cycles. The second phase would be aimed at a long term upgrading of 
the techniques for estimating sunspot cycles, by making use of additional informa- 
tion and techniques such as the recovery of additional sunspot cycles prior to 
1750 which would take a considerable length of time to achieve. This two -phase 
program is recommended since it is believed that significant improvements, 
even relative to the new ADAPT derived algorithms, are possible in a matter 
of months. This can be accomploshed by using all of the currently available 
data, the three cycle base for extrapolation, and the available auxiliary informa- 
tion such as polarity, angular momentum and position in 180 year history. On 
the other hand, it is also believed that after the use of the ADAPT techniques to 
carry out the recovery of additional sunspot cycles further significant improve- 
ments especially in the long range prediction (greater than 15 to 20 years) of 
the sunspot cycles are likely. 

Immediate improvements are recommended for both the extrapolation and pre- 
diction techniques using the ADAPT technology. For both the extrapolation and 
prediction techniques it is recommended that the data base consist of all available 
data from cycles 0 through 20, as well as the polarity and position in 180 year 
cycle of the cycle being predicted. In addition, the angular momentum of the 
solar system over the period of the preceding two cycles and for the period of 
the cycle being predicted should be included in the data history. 

The extrapolation should be based on the use of the data from the preceding 
two cycles as well as the available portion of the cycle being predicted. In 
addition a short term correction algorithm should be developed to account for 
the fact that the present values of sunspot number is known but slightly different 
from the extrapolated value for the present sunspot number. This correction 
algorithm would take as input the predictions over the next six to twelve months 
and the actual value and provide as output corrections to the predicted values to 
account for the present actual value of the sunspot numbers. The ADAPT pre- 
diction algorithms should be developed exactly as they were in the present study 
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with the improved data base described above. Analysis should also be carried 
out to determine the feasibility and complexity associated with introducing the 
constraint of positive values of the sunspot numbers into the extrapolation 
algorithm. 

The long term improvement in sunspot prediction accuracy should rest pri- 
marily on the addition of the data developed by the studies recommended in 
Section 5. When these studies are completed it is recommended that the new 
data be used to develop improved algorithms in essentially the same manner 
as outlined above. It may also prove desirable to develop algorithms for pre- 
dicting the 180 year cycle if sufficient sunspot data can be recovered. It also 
may prove possible to recover annual data considerably further back in time 
than monthly data in which case algorithms should be developed to predict annual 
sunspot averages for long periods in the future. Clearly, the detail definition 
of the phase two tasks must await the completion of the studies for recovering 
additional sunspot cycles which will be discussed in Section 5. 
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TABLE 4. 1 


SUNSPOT CYCLE START AND END DATES AS USED IN ADAPT ANALYSIS 


CYCLE NO. 

BEGIN DATE 

END DATE 

1 

June, 1755 

Aug. , 1766 

2 

Aug. , 1766 

Sept. , 1775 

3 

Sept. , 1775 

July, 1784 

4 

July, 1784 

July, 1798 

5 

July, 1798 

Sept. , 1810 

6 

Sept., 1810 

June, 1823 

7 

June, 1823 

May 1834 

8 

May, 1834 

Oct. , 1843 

9 

Oct., 1843 

Sept. , 1855 

10 

Sept. , 1855 

Feb., 1867 

11 

Feb., 1867 

March, 1879 

’ 12 

March, 1879 

April, 189Q 

13 

April, 1890 

June, 1902 

14 

June, 1902 

June, 1913 

15 

June, 1913 

March, 1924 

16 

March, 1924 

Dec. , 1933 

17 

Dec. , 1933 

June, 1944 

18 

- June, 1944 

June, 1954 

19 

June, 1954 

Sept. , 1964 
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EFFECT OF DIMENSIONALITY ON PREDICTED ALGORITHM PERFORMANCE 
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TABU 4. 4 (CONT'D) 
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TABLE 4. 5 


COMPARISON OF RMS ERROR AND REPRESENTATION USING 6 DIMENSIONS 


CYCLE NO. 

(n) 


RMS -ERROR 
E RMS 


n-1 


n-2 


3 

26. 13 

.781 

4 

18. 51 

. 890 

5 

15.79 

. 950 

6 

8. 85 

. 963 

7 

21. 28 

. 950 

8 

20. 56 

. 951 

9 

19. 68 

. 923 

10 

11. 88 

. 978 

11 

14. 83 

. 904 

, I 2 

23. 51 

. 815 

13 

14. 81 

. 780 

14 

19. 81 

. 552 

15 

12. 47 

. 698 

16 

20. 78 

. 627 

17 

11. 23 

. 680 

18 

19.17 

. 801 

Avg. 

17. 5 

. 83 

Std. Dev. 

4.7 

. 13 

19 

44. 3 

.74 

20 

13. 7 

. 91 

21 

- 

.92 
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TABLE 4. 6 


No. 


SUMMARY OF RMS ERROR, E RMS *. FOR EXTRAPOLATION OF SUNSPOT CYCLES 
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84 
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36 
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77 

6. 

11 
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21 . 

50 
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67 
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55 

12. 11 

12. 

67 

10. 

88 
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86 

7. 

63 

9. 

88 

6. 

88 
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II 

10. 

21 

7 

45 

' 8. 

26 

7.74 

6. 

61 

8. 

10 

7. 

72 

5. 

49 

•5. 

50 

4. 

29 
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46 
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81 

7. 22 
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94 

6. 

80 
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69 
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FIGURE 44 SINGLE CYCLE FIRST OPTIMUM FUNCTION 
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FIGURE 4.5 SINGLE CYCLE SECOND OPTIMUM FUNCTION 
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FIGURE 46 SINGLE CYCLE THIRD OPTIMUM FUNCTION 
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FIGURE 47 SINGLE CYCLE FOURTH OPT I/VI U/Vi FUNCTION 
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FIGURE 49 SINGLE CYCLE SIXTH OPTIMUM FUNCTION 
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FIGURE 4. 18 RELATIVE IMPORTANCE SPECTRUM FOR CANONICAL PREDICi ION OF 
FIRST COEFFICIENT 



• 1.0 3.0 1.0 4.0 (.0 4.0 7.0 

COEFFICIENT NUMBER 


04 TC • •••• CAW 33.0 MEMO 0.0 


67 



SQUARED 


FIGURE 4. 19 RELATIVE IMPORTANCE SPECTRUM FOR CANONICAL PREDICTION OF 
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FIGURE 4.25 RELATIVE IMPORTANCE SPECTRUM FOR PREDICTION OF SECOND 
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FIGURE 4.25 RELATIVE IMPORTANCE VECTOR FOR PREDICTION OF FIRST COEFFICIENT 
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FIGURE 4.31 PREDICTED (USING CYCLES 17 AND 18) SUNSPOT NUMBER AND 2-SIGMA 
ERROR BOUNDS FOR CYCLE 19 
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FIGURE 4.36 COMPARISON OF ACTUAL SUNSPOT NUMBER AND 2-SIGMA BOUNDS ON 
PREDICTION FOR CYCLE 20 
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FIGURE 4,37 EFFECT OF DIMENSIONALITY ON THE RMS ERROR OF THE SUNSPOTS 
EXTRAPOLATION 
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FIGURE 438 SCATTER PLOT COMPARISON OF THE PREDICTED AND ACTUAL FIRST 
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FIGURE 4.41 


COMPARISON OF ACTUAL AND 2-SIGMA ERROR BOUNDS FOR 1ST 
QUARTER EXTRAPOLATION OF SUNSPOT CYCLE 19 
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FIGURE 442 1ST QUARTER EXTRAPOLATION OF SUNSPOT NUMBER AND 2-SIGMA ERROR 
BOUNDS FOR CYCLE 20 



MONTHS SINCE BEGINNING OF CYCLE 

• •• o<f«c»Lt Our*... 


91 


SUNSPOT NUMBER 



92 



SUNSPOT NUMBER 


FIGURE 444 COMPARISON OF ACTUAL AND 2-SIGMA ERROR BOUNDS FOR 1ST 
QUARTER EXTRAPOLATION OF SUNSPOT CYCLE 20 
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FIGURE 4.45 2ND QUAR 
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FIGURE 4.45 2ND QUARTER EXTRAPOLATION OF SUNSPOT NUMBER AND 2-S IG/Y1A 
ERROR BOUNDS FOR CYCL£ 20 
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FIGURE 4.50 


COMPARISON OF ACTUAL AND 2-SIGMA ERROR BOUNDS FOR 2ND 
QUARTER EXTRAPOLATION OF SUNSPOT CYCUE 20 
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FIGURE 451 
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3RD QUARTER EXTRAPOLATION OF SUNSPOT NUMBER AND 2-SIGMA 
ERROR BOUNDS FOR CYCLE 19 
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FIGURE 4.52 COMPARISON OF ACTUAL AND 3RD QUARTER EXTRAPOLATION OF 
SUNSPOT CYCLE 19 
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FIGURE 453 


COMPARISON OF AClUAL AND 2-SIGMA ERROR SOUNDS FOR 3RD 
QUARTER EXTRAPOLATION OF SUNSPOT CYCLE 19 
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FIGURE 4.54 


3RD QUARTER EXTRAPOLATION OF SUNSPOT MUM 
ERROR BOUNDS FOR CYCLE 20 
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FIGURE 456 COMPARISON OF ACTUALAND 2-SIGMA ERROR BOUNDS FOR 3RD 
QUARTER EXTRAPOLATION OF SUNSPOT CYCLE 20 
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FIGURE 4.57 COM PAR I SON OF ADAPT AND SELECTIVE REGRESSION ESTIMATES FOR 
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5. 0 ANALYSIS OF SUNSPOT DATA 


5. 1 Estimate of Sunspot Cycle Properties 

The ADAPT regression techniques provide a capability to estimate character- 
istics of future, or past, sunspot cycles, based on the present or previous 
sunspot cycles and auxiliary information. Three characteristics which appear 
particularly useful to predict are the period, the maximum sunspot number, 
and the time of the maximum for the preceding or succeeding sunspot cycle. 

This information, although it may be extracted from the prediction of the 
sunspot numbers for the cycle, is useful as an independent prediction for two 
reasons. The first is that the selection of these parameters from the predicted 
sunspot number is often somewhat ambiguous. For example, consider the task 
of estimating the period of the cycle. The estimated cycle approaches zero and 
in some cases drops below zero sunspot number. The exact intercept which 
one should take as the end of the cycle is not absolutely clear. Similar problems 
occur when one adds the error bands to estimating the exact time of maximum 
or value of maximum. Thus, the direct prediction of these quantities could over- 
come some of the ambiguity in estimating them for cycles which have been 
predicted. 

The second and perhaps a more important reason for this prediction capability 
is to predict characteristics of future or past cycles for which the sunspot 
numbers have not yet been predicted. Reference 13 has shown that certain 
characteristics of sunspot cycles behave in an orderly way. Figures 5. 1 and 
5. 2 which have been taken from Reference 13 illustrate such behavior. Figure 2. 1 
shows the peak sunspot number as a function of date for the negative cycles. Exam 
ination of this figure shows a very orderly process over a 180 year cycle. A 
similar figure for the positive cycles is presented in Reference 13. However, 
the positive cycles have a much less orderly behavior. The solid squares on 
Figures 5. 1 and 5. 2 represent the updated positions for cycles 20 and 21 based 
on the most recent ADAPT predictions presented in the preceding sections. It 
should be recalled however, that the ADAPT predictions are for 81 -day running 
averages rather than for 12-month running averages. The ADAPT points shown 
in Figures 5.1 and 5.2 have been corrected to 12-month averages. The polarity 
and mode for cycle 20 remains the same as estimated in Reference 13 and, in 
fact, the new values show better agreement with these correlations than the pre- 
dictions of Reference 13. 

The ability to predict the maximum sunspot number, the period, and location of 
the maximum allows one to place the next sunspot cycle on one or both of these 
types of figures. From this one can obtain an estimate of how typical the next 
cycle will be and thus have an additional validity criterion. In addition, the rela- 
tionships between these quantities indicated by these figures allows one to correct 


107 



these predictions by moving the point to a region on the plots consistent with 
the behavior of the previous sunspot cycles. This improved estimate of the 
period, the maximum sunspot number and the time of maximum sunspot number 
could be used as input data to the prediction of the sunspot cycle. It would 
probably add information which is not used in the present ADAPT predictions 
since it involves a very nonlinear procedure. 

The present analysis was concerned primarily with development of techniques 
for predicting future cycles, and thus very little effort was spent on predicting 
properties of the cycle in general. However, a good estimate of the period of 
a cycle is required to estimate the next succeeding cycle. The simplest method 
for predicting the period of a sunspot cycle for which one has estimated the sun- 
spot numbers as a function of time is to examine this estimate and determine 
when it reaches zero. The difficulty with this approach is in the definition of 
when the predicted sunspot cycle actually reaches zero. Because there is always 
a finite error to be expected it is extremely unlikely that the sunspot cycle will 
reach a true zero at the end of the cycle; in fact the minimum for the 81 -day 
running average over the first 18 cycles is a sunspot number of 5. 3. Thus, it 
appears more reasonable to use the crossing of 5. 3 as the nominal estimate of 
the period of the sunspot cycle. Table 5. 1 presents a comparison of using the 
crossing of sunspot numbers 3, 5, and 10 as an estimate of the sunspot period 
over the learning set of cycles 1 thru 18. We see that the use of sunspot number 
of 3 tends to underestimate the threshold and use of a threshold of a sunspot num- 
ber of 1 0 tends to overestimate the threshold. The standard deviation of the error 
in the estimate of the period actually tends to be a minimum between sunspot 
numbers of 5 and 10. Thus, we shall use the general ground rule for this pro- 
cedure that the intercept of the prediction with the sunspot number of 5 constitutes 
the end of the sunspot cycle and defines the period of the sunspot cycle. A similar 
analysis using the third quarter extrapolated sunspot histories indicated that the 
best estimate for the extrapolated histories is obtained by using as the intercept 
5 above the minimum sunspot number. This modification was required since 
some of the extrapolated histories produced small negative sunspot numbers. 

Algorithms were also developed to predict the period of another cycle both from 
the preceding two cycles and from the coefficients of the cycle itself. The latter 
prediction algorithm was made for use with the extrapolation predictions where 
one has estimates of the coefficients of the cycle for which one would like to know 
the period. The prediction of the period of the next sunspot cycle is based on a 
two cycle base and allows one to predict the period of the future sunspot cycle 
for which the coefficients have not yet been estimated. 

Figures 5. 3 through 5. 7 present the performance and relative importance informa 
tion for the algorithms developed to predict the period of a sunspot cycle using the 
ADAPT estimate of that sunspot cycle. Note that one could either use the extrap- 
olated estimate or the predicted estimate with this algorithm to find the period 
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of the sunspot cycle. Figure 5. 3 presents a plot of the estimated and the actual 
values of the period for sunspot cycles 1 through 18 which were used as learn- 
ing data for developing this algorithm in six dimensions. The solid line on 
Figure 5. 3 is the line of perfect agreement. The dash lines indicate a one- 
year error in estimating the length of the sunspot cycles. Only three cycles 
have their period estimated in error by more than one year using this algo- 
rithm. The 2- sigma error for this algorithm is approximately 18 months. 

The relative importance vector for this algorithm is shown in Figure 5. 4. As 
before this vector shown the importance of each region in the estimate of the 
sunspot cycle to determining the period of that sunspot cycle. The dot product 
of the relative importance vector with the data history yields a number which 
differs from the period estimate by a known constant. This algorithm is sum- 
marized in Table 5. 2 which has been designed to allow one to implement the 
algorithm without further reference to this report. The relative importance 
spectrum associated with this algorithm is shown in Figure 5. 5. Here we 
see that the dominant term in the optimal series for determining the period is 
the sixth term. The second, fourth, and fifth terms also make contributions 
to this estimate. Thus, we conclude that one can make a better estimate of 
the period using six dimensions than one would using only two dimensions. To 
verify this an algorithm was developed in two dimensions. The estimated ver- 
sus actual predictions of the period is shown in Figure 5. 6 for the two dimen- 
sional algorithm. Again the dash lines indicate an error of one year by more 
than a year. The 2- sigma variation of the estimate of the period using this 
algorithm is 24.4 months. Thus, we have verified the conclusion suggested 
by examination of the relative importance spectrum. 

The algorithms derived on the single cycle base allows one to predict the period 
which should be associated with any estimated sunspot cycles. It is also desirable 
to be able to predict the period of the next sunspot cycle without making the pre- 
diction for the cycle itself. To do this it was decided that the best predictor 
would be the same predictor used to predict the next sunspot cycle, namely, the 
double cycle base. Thus, an algorithm was developed using the double cycle 
base to predict the period of the next cycle. This algorithm was developed in 
exactly the same way as the algorithm to predict the coefficients of the next sun- 
spot cycle. The performance of this algorithm, when developed in six dimensions, 
is shown in the plot of the estimated versus actual periods in Figure 5.8. The 
dash line shows the one year error bands and we see that the predicted periods 
for four cycles have errors greater than one year. The 2-sigma error for this 
prediction is 20.4 months. Figure 5.9 shows the relative importance vector for 
predicting the period of the next sunspot cycle using the preceding two sunspot 
cycles. It is interesting to note that both of the preceding sunspot cycles make 
a significant contribution to the estimate of the period of the next cycle. The 
most dominant half cycle in the estimate is the first half of the second cycle pre- 
ceding the cycle for which the estimate is being made. The least important half 
cycle is the second half of this same cycle. This relative importance vector 


109 



provides further evidence of the wisdom of selecting the two preceding cycles 
basis for predicting information regarding a sunspot cycle, rather than 
utilizing just the preceding cycle. Table 5. 3 presents the detail instructions 
for applying this algorithm for predicting the period of future sunspot cycles. 

Figure 5. 10 presents the relative importance spectrum associated with this 
algorithm for predicting the period of a sunspot cycle based on the preceding 
two sunspot cycles. This relative importance vector shows that the most 
important term in the double cycle base for predicting the period of the next 
cycle is the third term and that the first and fourth terms also make significant 
contributions to this prediction. 

We have presented three general methods by which the period of a sunspot 
cycle may be estimated: 1) The intercept of the estimated cycle with the 
threshold sunspot number, 2) Utilizing the estimated cycle in the ADAPT 
single base period prediction algorithm presented in Table 5. 2, and 3) The 
prediction of the period directly from the preceding two sunspot cycles using 
the algorithm presented in Table 5. 3. Table 5. 4 compares the performance 
of these three methods in terms of the standard deviation of the. error based 
on the learning data and the performance in predicting the period of cycle 19- 
The predictions for the period of cycles 20 and 21 are also included. It is 
interesting to note that the period for both cycles 4 and 9 was underestimated 
by the prediction based on the preceding two cycles. Since both the analysis 
of Reference 13 and the scatter plots obtained in this study indicate cycles 4 
and 9 are similar to cycle 20. This suggests that even the estimates of a 
significantly longer cycle 20 reported here, might actually be underestimates 
of the length of cycle 20. ^ 

5. 2 Rearward Predictions 

Since the methods investigated in this study have shown better than a factor of 
two improvement in the ability to predict the sunspot numbers, and the addition 
of data such as the angular momentum of the solar system can be expected to 
significantly enhance this improvement, it is apparent that the recovery of earlier 
sunspot information can be significantly improved by the application of these 
techniques to estimating sunspot cycles in a rearward direction. This is extremely 
important since it will increase the amount of learning data available. There is 
confirming data for sunspot averages based on historical information such as the 
auroral displays. Others, for example see Reference 14, have shown that this 
information can be combined with even relatively crude estimation techniques for 
recovering estimates of the sunspot behavior as early as 600 B. C. Thus, the 
ability to accurately predict in a rearward direction would allow one to fill in 
the gaps between the available observations more accurately. It appears likely 
that the periods and perhaps maximum sunspot number would be easiest to re- 
cover. The next most likely quantity to be estimated in a rearward direction is 


See Appendix D for further discussion of this point. 
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the annual sunspot numbers. The ability to estimate the 81 -day running 
averages to significantly early dates will depend greatly on the ability to 
develop algorithms which can make use of the annual information to estimate 
the monthly information. Since it is unlikely that there will be observations 
which can be useful in pinning down monthly values other than through the 
annual averages, it is unlikely that the learning data for the 81 -day running 
averages can be extended much earlier than 1700. On the other hand, it is 
quite likely that at least 100 and maybe several thousand years of additional 
annual data can be obtained.- If this is the case, and if it can be shown that 
the annual data is useful in predicting the monthly data, this annual data can 
then be used to predict forward beyond cycles 20 and 21 and then used as 
input to the monthly predictions for the forward-running information. Thus, 
the development of rearward prediction algorithms and the use of these 
algorithms to recover as much of the annual sunspot history prior to 1700 
as possible should significantly improve the ability to make long range pre- 
dictions of sunspot activity. 

There are additional advantages to carrying through this rearward prediction 
over a significant length of time. For example, the availability of solar 
activity for a significant length of time (i.e. , thousands of years) could pro- 
vide sufficient information that this activity could be incorporated into stellar 
models. The verification of stellar model predictions of solar activity would 
be a major breakthrough in the understanding of stellar models and in the 
ability to project the effect of the sun on the solar system's environment for 
the distant past and the very distant future. Another potential benefit from 
studying the sunspot cycles over a significant length of time is the verification 
of the relationship between the angular momentum of the solar system and the 
sunspot cycles. If this proves to be valid, it offers an opportunity to use the 
stellar activity as a basis for inferring information about possible planetary 
systems beyond our solar system. 

Although not required by the present study, it was possible to incorporate pre- 
liminary analysis of rearward predictions during the early exploratory studies 
as part of the development of exploratory representations. As a result of this 
a preliminary base using cycles 1 through 19 suitable for rearward predictions 
of sunspot cycles was developed and its major characteristics are presented in 
Figures 5. 11 through 5. 17. The major difference between this base and the 
single cycle base used for the forward predictions is that the start of the sunspot 
history is left open instead of the end, and the end point of the sunspot history 
were selected as month number 180. The sunspot cycle was then plotted with 
zeros in those month from zero until the first sunspot of the history. This had 
the effect of more highly organizing the variation of the back half of the sunspot 
cycle since they were all forced through point 180 and disorganizing the first 
half of the sunspot cycle. The average of the cycles 1 through 19 constructed 
in this manner is presented in Figure 5.11. 
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The ADAPT representation for the cycles 1 through 19 was then constructed 
by subtracting the average of cycles 1 through 19 presented in Figure 5. 11 
from each of the cycles and processing the resulting histories in the ADAPT 
programs to find the optimum representation. Figure 5.12 presents the 
information energy as a function of the number of terms retained in the optimal 
series representation. We see that for this base the first term contains 
approximately 50% of the information as compared to approximately 60% for 
the forward facing representation. The second term contains approximately 
29% as compared to 18% for the forward facing sunspot cycles and the third 
term contains approximately 9%. Again there appears a second break at the 
fifth term in this history. By far the greatest amount of the information, 
namely 88%, is contained in the first three terms of this history. The first 
five optimal functions are presented in Figures 5. 13 through 5. 17. Comparison 
of Figure 5.13 with Figure 4. 3 shows that the first term of the series now con- 
tains information over almost the entire cycle. The second term of the series 
shown in Figure 5. 14 looks very much like the first term in the forward running 
sunspot cycles. This behavior is a direct result of the enhanced order of the 
second half of the sunspot cycle at the expense of the first half. The fact that 
the rearward representation has only 50% of the information in the first term 
is an indication that the most natural way to present the data is in the forward 
direction. The fourth and fifth optimal functions presented in Figures 5. 16 and 
5. 17 show the characteristics of the higher numbered optimum functions for 
the forward running base, namely, they present the detailed structure information 
which is required to fill in the detailed oscillations occurring in the sunspot 
cycle. 

Since the availability of an appropriate single cycle base is the only requirement 
for extrapolating a data history, it was possible, within the constraint of the 
present program, to apply the extrapolation program to this rearward facing base 
to complete cycle zero. Since the first optimum function now contains information 
over the entire cycle and the break point in the energy curve occurs at the fifth 
optimum function, it appears that any number of terms from 1 to 5 might be the 
best for extrapolating the sunspot histories in the rearward direction. Figure 
5. 18 presents the results of the rearward extrapolation for 1, 2 and 18 terms. 

The 18 -term extrapolation is clearly overdetermined and we have a clear illus- 
tration of the effect of overdetermination here. Namely, the 18-term representa- 
tion is a very poor estimate of the future although it does a reasonably good job 
of matching the input values. The 3, 4 and 5 term predictions lie between the 
2 term and the 18 term prediction and thus have not been included in this figure. 

Examination of Figure 5.18 indicates that either the 1 or 2 dimensional recon- 
structions probably represent the best results. The 2 dimensional reconstruction 
already has the difficulty that it has negative sunspot numbers although the 
maximum negative value is only minus 10. Also, if we use the sunspot number 
of 5 above zero intercept method of predicting the period, the 2 term recon- 
struction extrapolation predicts a period approximately a year less than that 
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would be indicated by Reference 14. The 1 term reconstruction has no 
negative values of sunspot numbers but tends to overpredict the Schove 
minimum by slightly more than a year. If we use the 5 sunspot number 
above the minimum as the intercept the 1 term representation still over- 
predicts the period by a year, but the 2 term now only underpredicts by 
about half a year. Thus, the two term estimate ending at about month 60 
appears to be the best estimate of cycle 0. The advantage of adding the 2 
preceding cycles to the forward predictions suggests the rearward pre- 
dictions can be improved by adding the 2 following cycles. In fact, there 
may even be significantly greater gain in the rearward predictions because 
of the greater amount of information contained in the rearward terms 2 
through 18 as compared to the forward terms 3 through 18. Thus, it is 
clear that the ADAPT techniques would significantly improve the recovery 
of information from the earlier cycles. In particular, it has already 
achieved a somewhat better estimate of the 81 -day running averages from 
March of 1749 to early 1744. 

5 . 3 Clustering Studies 

The ADAPT programs provide as by-products to any analysis of data a 
series of outputs which are extremely useful for finding natural groups or 
clusters in the data. For example, a plot of the first coefficient versus 
the second coefficient of the optimal Fourier series representing each 
history is the best two dimensional representation of the data which can 
be made. This follows from the fact that the first coefficient explains the 
greatest amount of variation that one can explain in any single term repre- 
sentation and that the first two terms in the optimal series explain the greatest 
amount of variation in any possible two terms representation of this data. 

Since this latter amount of variation is displayed graphically in a two dimen- 
sional scatter plot when these two coefficients are plotted as a function of each 
other, one has the best two dimensional representation possible. Figure 5. 19 
presents such a plot for the single cycle forward facing base. Each of the 
cycles designated by the numbers enclosed in circles, triangles or squares 
are located according to the values of the first coefficient and second coefficient 
of the optimal Fourier series representation. For example, consider sunspot 
cycle 1 which is located at an NP 1 coordinate value of approximately 190 and 
an NP 2 coordinated value of approximately -95. This means that to reconstruct 
sunspot cycle 1, one takes the average of the single sunspot cycles presented 
in Figure 4. 2 and adds to it at each point or for each month 190 times the value 
of the corresponding month in the first optimal function shown in Figure 4. 4 
and to that sum adds the product of -95 and the corresponding value of the second 
optimal function shown in Figure 4. 5. The relationships between the sunspot 
cycles as displayed on this scatter plot, accounts for approximately 80% of the 
variation in the data. 
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The circled sunspot cycles on this figure represent the actual values obtained 
by projecting the observed sunspot numbers for that cycle on the single cycle 
base discussed in Section 4. 1. The triangle around cycle 20 indicates that this 
position for cycle 20 is based on the third quarter extrapolation of cycle 20 
as described in Section 4. 3. Since the cycle has not yet been completed, this 
is the best available estimate of cycle 20‘s location. Similarly, the square 
around 21 indicates that this is the best estimate based on the predictions using 
the cycles 19 and 20 in the algorithms presented in Table 4. 3. 

The scatter plot can be used to obtain the same classification of sunspots 
according to mode that was reported in Reference 13 by plotting the maximum 
sunspot number versus period independently for the positive and negative sun- 
spot cycles. If one considers the scatter plot to be divided into two regions 
by line c-d one notices that all of the sunspot cycles to the right line c-d are 
mode 1 and to the left of line c-d are mode 2 sunspot cycles. However, a 
more careful examination shows that one may also draw the line a-b and then 
consider the region A to the right of a-b, B between lines ab and cd, and C 
between lines cd and ef and D to the left of line ef. 

If one now considers the negative and positive cycles independently, one sees 
that there is an even stronger separation between mode 1 and mode 2 for fixed 
polarity. That is, the negative mode 2 cycles all lie in region A and the negative 
mode 1 cycles all lie in region C. Regions A and C are separated by the entire 
expanse of region B. Furthermore, no positive mode 1 cycles lie in region C 
so the positive mode 1 cycles are separated from cycle 19 which has been ten- 
tatively identified as the only known positive mode 2 cycle, by the entire expanse 
of region C. Thus, we see that the scatter plot was capable of identifying the 
separation between mode 1 and mode 2 as a weak separation even when the polarity 
was ignored and that when the polarity was considered the separation became very 
strong. Although these classifications into mode 1 and mode 2 have been found 
independently by Sleeper using more conventional analysis it is hoped that this 
example will illustrate how the ADAPT scatter plot can be used to accomplish 
this analysis. 

The 2-dimensional representation provided by the scatter plot is not the only 
useful form of clustering analysis. It is often desirable to perform clustering 
analysis in higher dimensional spaces. This is especially true when the first 
two dimensions do not explain the great majority of the variation, it is not ex- 
pected that a higher dimensional cluster analysis will yield more significant 
results. However, to illustrate the technique, the single cycle data was pro- 
cessed through the ADAPT nearest neighbor program and the nearest neighbor 
of each of the sunspot cycles determined. This information is plotted in Figure 
5. 20. Figure 5. 20 may be use.d to construct nearest neighbor trees as follows. 
Since Figure 5. 20 plots the sunspot cycle as the abscissa and the nearest sun- 
spot to each of the sunspot cycles as the ordinate, one may now read the curve 
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in the other direction and answer the question: For which cycles is cycle X the 
nearest neighbor? Each of these cycles for which a given cycle is the nearest 
neighbor are assumed to be a member of a grouping containing cycle X and 
provide the first branch in the tree presented in Figure 5. 21. The process 
is then repeated for each element of the branch. There are one of three pos- 
sible results: 1) One might find that a given cycle is not the nearest cycle to 

any other cycle in which case the procedure terminates for that path, 2) One 
might find that a given cycle is the nearest cycle to the cycle which produced 
that branch in which case the procedure terminates for that path, or 3) One 
may find that a given cycle introduces an entire new branch and the procedure 
may be continued. This procedure has been carried out and as shown in 
Figure 5. 2 where three other groups of four or more sunspot cycles are de- 
fined. These four groups are enclosed in the dash lines shown in Figure 5. 19- 
The groups are logical groups on this figure as could be expected from the 
fact that Figure 5. 19 actually contains 80% of the variation. Other examples 
of this nearest neighbor analysis are given in References 1 and 4 and the 
reader is referred to these references for more details on this analysis. 

As pointed out, these clustering outputs are by-products of the ADAPT analysis 
and have been included in the present report to illustrate some of the potential 
of the ADAPT programs for further analysis. There is no intention that this 
report be a complete clustering analysis as the major objective of this study 
was the development of advanced prediction techniques. However, to provide 
the reader with capability to perform clustering analysis which may be useful 
for other purposes the scatter plots for the other two bases which have been 
used namely the single cycle rearward prediction and the double cycle base are 
given in Figures 5. 22 and 5. 23 with the location of the sunspot cycles indicated 
on these figures. The numbers for the cycles on Figure 5. 23 is the number 
associated with the first cycle in the pair for the double cycle. For example, 

No. 2 on Figure 5. 23 is the location of double cycle 2-3. 

5. 4 Recommendations 

This section briefly summarizes the recommended analysis suggested by the 
preceding three sections. The result most pertinent to the present study is 
that the best estimate of the periods of a sunspot cycle for which one has es- 
timated the sunspot numbers is given by applying the algorithm presented in 
Table 5. 2. This algorithm yields a period of approximately 150 month for 
cycle 20 which has been incorporated in the prediction shown in Figure 2. 1 . 

The studies carried out here have shown that there is great potential for re- 
covering additional sunspot data from historical records by applying the ADAPT 
techniques. To accomplish this it is recommended that the same developments 
outlined in Section 4. 5 to improve the forward predicting algorithms be incorpor 
ated into developing algorithms for predicting rearward cycles. In addition, it 
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is recommended that these algorithms be developed to predict the individual 
properties of sunspot cycle such as period, maximum sunspot number and time 
of maximum in both the forward and rearward directions. It is also recom- 
mended that algorithms be developed in both the forward and rearward directions 
to predict the annual average sunspot numbers. The annual sunspot numbers 
for the cycle being predicted should also be included in the data vector for pre- 
dicting the 81 -day running average sunspot number, since it is likely that the 
result of the analysis suggested here will be that one can predict the annual 
sunspot cycles significantly better than the monthly cycles. Having developed 
the algorithms to predict the properties of the sunspot cycles and the annual 
values it is recommended that these algorithms be incorporated in an analysis 
similar to that carried out in Reference 1 4 to determine the best estimate of 
sunspot activity to the onset of available records or at least 600 BC. When 
this has been accomplished, it is recommended that the implications of both 
the sunspot activity and its relationship to the angular momentum of the solar 
system be applied to the construction of stellar models and to developing 
observational techniques for gaining information about planetary systems. 

It is also recommended that the ADAPT clustering analysis be used for "a 
scientific fishing trip" to determine if there are any groupings of interest. 

This analysis should be carried out using both single and double bases, 
scatter plots and nearest neighbor analysis. Any groupings which are found 
such as those enclosed in dashed circles in Figure 5. 19 should be studied 
individually. The ADAPT programs may be used for this both to construct 
average sunspot cycles for each of the groups and to construct relative impor- 
tance vectors for separating each of the groups from one another and from all 
of the remaining data. These relative importance vectors will show exactly 
what portions of the cycles make each of the groups stand out as a group and 
can be used as a basis for trying to understand the reason for the groupings 
observed. 
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COMPARISON OF THREE METHODS FOR ESTIMATING THE PERIOD OF A SUNSPOT CYCLE 
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FIGURE 5. 1 EFFECT OF ADAPT PREDICTIONS ON TREND OF PEAK MAGNITUDE FOR 

NEGATIVE CYCLES 
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FIGURE 5. 2 LOCATION OF ADAPT PREDICTIONS ON MAX SUNSPOT NUMBER VERSUS 
PERIOD PLOT 
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FIGURE 5.3 ESTIMATED VERSUS ACTUAL PERIODS USING SIX DIMENSIONS OF THE 
SINGLE CYCLE BASE 
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FIGURE 5.4 RELATIVE IMPORTANCE VEClOR FOR PREDICTING THE LENGTH OFA CYCLE 
FROM THE ESTIMATED CYCLE USING SIX DIMENSIONS 
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FIGURE 5.5 
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FIGURE 5.6 ESI IMATED VERSUS ACTUAL PERIOD USING TWO DIMENSIONS OF THE 
SINGLE CYCLE BASE 
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FIGURE 5.9 RELATIVE IMPORTANCE VECTOR FOR PREDICTING PERIOD FROM THE 
PRECEDING TWO CYClfS 
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FIGURE 5. 10 RELATIVE IMPORTANCE SPECTRUM FOR PREDICTING THE PERIOD FROM 
THE COEFFICIENTS OFTHE PRECEDING TWO CYCLES 
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FIGURE 5.11 REARWARD PREDICTION-SUNS POT AVERAGE INPUT VECTOR THREE- 
MONTH RUNNING AVERAGE-CYCLES 1 THRU 19 

i oo 
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FIGURE 5.13 REARWARD PREDICTION-FIRST OPTIMAL FUNCTION-THREE MONTH 
RUNNING AVERAGE-CYCLES 1 THRU 19 
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FIGURE 5. 14 REARWARD PREDICT I ON-SECOND OPTIMAL FUNCTION-THREE -MONTH 
RUNNING AVERAGE-CYCLES 1 THRU 19 
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FIGURE 5. 15 


REARWARD PREDICTION-THIRD OPTIMAL FUNCTION-THREE MONT! 
RUNNING AVERAGE-CYCLES 1 THRU 19 
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FIGURE 5.16 REARWARD PREDICTION-FOURTH OPTIMAL FUNCTION-THREE MONTH 
RUNNING AVERAGE-CYCLES 1THRU 19 
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REARWARD PREDICTION-FIFTH OPTIMAL FUNCTION-THREE MONTH 
RUNNING AVERAGE-CYCLES 1THRU 19 


] 


i 

I 



<0 (O to 100 IJO 140 1(0 Ito 


INDEXING VARIABLE 


137 





SUNSPOT NUMBER 
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19 SCATTER PLOT FORWARD SlliGLE CYCLE CASE 
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FIGURE 5.20 NEAREST NEIGHBOR PLOT SINGLE CYCLE BASE 
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FIGURE 5.21 NEAREST NEIGHBOR TREE SliiGlf CYCIE BASE 
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FIGURE 5.23 SCATTER PLOT DOUBLE CYCLE BASE 
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APPENDIX A 


FEATURES OF ADAPT ANALYSIS 


The unique aspect of the ADAPT approach to empirical data analysis is pre- 
ceding the analysis with the derivation of the optimal representation for the 
particular data set. The ADAPT programs provide a unique capability for 
determining this optimum representation for large data sets. However, 
regardless of the size of the data set, the availability of this optimum repre- 
sentation provides many significant benefits to any further empirical analysis. 
These benefits include: 1) definition of which variables dominate the variation, 

2) ordering of the data by its general usefulness for extracting information, 

3) reduction in the computation required to perform further analysis, 4) re- 
duction in the amount of learning data required to perform any given analysis, 

5) an improved ability to establish performance and validity criteria, and 

6) the ability to perform special functions such as clutter subtraction and 
extrapolation. 

The availability of the optimum functions for representing any given data set 
is analogous to having the governing differential equations for a classical physics 
problem. These optimum functions provide information regarding the nature of 
the physics which govern the phenomena associated with this data. In particular, 
these functions will define exactly where the greatest and most highly correlated 
variation from case to case occurs. This information can be extremely useful 
in selecting data to be used for the analysis and in understanding the mechanism 
governing the phenomena which produced this data. 

In addition to simply having the optimum functions for representing the data, 
these functions are. ordered such that each function explains successively less 
variation in the data. This provides the user with a capability to reject variables 
in an intelligent rather than a random manner, if the resources or available 
learning data require the use of fewer dimensions than would naturally be used 
to describe the data. This ordering allows one to throw away those variables 
which explain the smallest amount of variation and therefore in general should 
be least useful to any analysis. Although it might be more desirable to be 
selective based on the particular analysis to be performed, this is not usually 
possible until after the analysis has been performed, when it is obviously no 
longer useful. Thus, it is almost axiomatic that the apirori rejection of data 
for a particular analysis cannot be based on that particular analysis, so the 
rejection based on explained variation is an attractive approach to eliminating 
data when realities of the resources or available learning cases makes such an 
elimination necessary. 

Regardless of any prior decision to reduce the dimensionality, the ADAPT 
approach to any real problem will automatically lead to a significant reduction 


A -1 



in dimensionality. When the information energy curves which are produced 
by the ADAPT programs are examined, it is almost always possible for the 
analyst to select some dimensionality after which it is inconceivable that 
further useful information is incorporated in the data. This criteria alone 
usually results in a reduction of dimensionality of more than an order of 
magnitude. 

A reduced dimensionality obviously allows one to perform computations with 
smaller computer capabilities. Furthermore, the orthogonality of the optimum 
representation also provides simplifications in the computation. For example, 
in the optimal ADAPT space one can in some cases derive the Fisher discrim- 
inant without inverting the covariance matrix. This combination of reduction 
in quantity of computation required and simplification due to orthogonality also 
makes it feasible to update classification and regression algorithms in real 
time for cases where this might otherwise be impossible. 

A more significant aspect of the lower dimensionality of the learning space 
follows from the requirement that the amount of learning data be large com- 
pared to the dimensionalty of the learning data. This requirement arises 
from the situation analogous to fitting a third order polynomial through a 
series of points. If the third order polynomial is to be fitted to three points, 
it will always fit perfectly and no physical relationship need be involved. How- 
ever, if the third order polynomial is to fit a hundred points well then one 
knows that this third order polynomial must be related to the data in some 
physical manner. The same is true for empirical analysis in general. If 
the number of dimensions of the learning space is equal to the number of learn- 
ing cases one can expect most empirical algorithms to provide a perfect fit to 
the learning data. However, this fit is normally based on differences between 
the population and the sample statistics and is not based on the physics of the 
problem. Experience has shown that the number of learning cases required to 
derive an empirical algorithm varies from 2 to 6 times the number of dimensions 
of the learning space. Thus, the usual ADAPT reduction of an order of mag- 
nitude or more in dimensionality of the learning space translates immediately 
into an equivalent reduction in the requirement for learning data. Since obtain- 
ing learning data is one of the most expensive aspects of empirical data analysis, 
this attribute of the ADAPT approach is often sufficient by itself to make the 
difference between feasibility and infeasibility of solving a given problem. 

The ADAPT representation also provides an opportunity for establishing a 
necessary, although net sufficient, validity criteria. Validity criteria provide 
a method of determining whether a particular test case is from the same popu- 
lation as the learning data, and therefore determine the validity of applying 
the algorithms derives on the learning data to that particular test case. The 
ADAPT validity criteria consists of comparing the length of the test data vector 
in the original data space and in the ADAPT optimum space. If this transforma- 
tion from the original data space to the optimum space results in a shortening 
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of the test data vector by a factor considerably greater than the shortening 
which the learning data vectors suffered, one has an indication that the test 
data and learning data are from different populations. In addition to providing 
this validity criteria, the ADAPT programs have been designed to calculate 
performance criteria as part of the learning process. These performance 
criteria provide the analyst with a basis for immediately evaluating how well 
he can expect a given algorithm to perform on test data. The ADAPT programs 
provide the analyst with both the performance criteria and the experience factor 
required t;o determine whether the algorithm derived is overdetermined. If 
the algorithm is overdetermined, the analyst must adjust the dimensionality 
of the problem or increase the quantity of learning data to derive a physically 
meaningful algorithm. 

The ADAPT approach of obtaining the optimum representation of the data prior 
to performing the analysis introduces the capability to perform clutter sub- 
traction on the data prior to performing the analysis. The clutter subtraction 
can be used to eliminate any characterizable aspect of the signature from the 
data histories. This is accomplished by subtracting the coordinate directions 
corresponding to those characteristics to be eliminated from the space prior 
to the optimization procedure. Another unique capability resulting from the 
optimum representation step is the ability to do an extrapolation making use 
of both historical data from previous data histories and the available portion 
of any given data history. Conceptually this is equivalent to utilizing historical 
information to guide the interpolation over missing data points. 

In addition to these advantages which accrue from the optimal representation, 
the ADAPT programs have been operational since approximately 1965. They 
have been applied to a great many different problems, and during this period 
part of the practical pit falls associated with empirical analysis have been 
encountered, overcome and the programs improved to take advantage of this 
experience. This experience has also provided Avco with the understanding 
of what diagnostic outputs are required to enhance the ability of the analyst to 
develop the required algorithms, and to provide the data necessary to reintroduce 
the physics to the problem at as many points as possible. The key areas where 
the physics may be reconsidered as part of the analysis are: 1) at the time of 
data selection and preprocessing decisions; 2) after the development of the 
optimum representation, it may be examined to insure that the variation is con- 
sistent with the expected variation based on the physics of the problem; 3) after 
the development of the algorithm, the relative importance vector may be examined 
to determine if the variables which appear important to the decision are con- 
sistent with the analyst's understanding of the physics and the relative importance 
spectrum may be examined to determined if the difficulty in obtaining the algo- 
rithm is consistent with the difficulty which would be expected based on the physics 
of the problem. 

In summary, the capability to find the optimum representation for large data 
vectors has been combined with many years of experience in using this representa- 
tion as a preliminary step preceding empirical data analysis. This unique 
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combination has been used to prepare a set of computer programs for per- 
forming empirical data analysis. These programs provide the user with a 
fast and economical way to generate simple empirical algorithms for 
classification, regression, clustering and extrapolation and/or analysis 
of any given set of learning data. 
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APPENDIX B 

OPTIMAL ORTHOGONAL EXPANSION FOR TWO FUNCTIONS 

We wish to carry though the ADAPT expansion of each of two given functions in the 
series of the optimal orthogonal functions defined by these two functions, as 
described in the Introduction. 

Suppose we are given the functions u 1 (t) and u 2 (t) of the independent variable tj 
over some domain t-^ £ t £ tg. Let f' unc 'tions be normalized, so that 

Then the only parameter is the product integral 

/C H A * , (<-111 

the last inequality being Schwarz 1 inequality for normalized functions. 

First we construct an orthonormal set of 2 functions v^, from the given ones 
by the Gram-Schmidt procedure. These functions are easily seen to be 

v, ; «, > s (u v -«.u,)/J|-, c y 

We now find the expansion coefficients of U]_, u 2 in a series of v , v 2 : 
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The optimal orthogonal functions are now obtained by finding the eigenvalues 
and eigenvectors ^ of the two-by-two matrix 

s--i 

(the factor in front corresponds to weighing by dividing by the number of functions 
in our case 2.) They are easily found to be 

, A, rAU-kl) 

i, r(Hh t ^ r ( fx 4 ^ - (x, ) 

The eigenvectors are the expansion coefficients of the optimal orthogonal functions 
hp, h2 in a series in v , v 2 , i.e., 

^ ) di 2 ) 

Returning to the original u functions we find the associated optimal functions 
to be 



and the expansions of the u functions in them are 

u.-fX, -A, + , u i- Q^A-f^A) 
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It is sufficient to discuss the case of/C*, 0 because if 0, a change in the sign 
of u 2 returns to the first case. We note that the optimal function h-^ is proportional 
to the average of the input functions. The average is intuitively the best single 
function to represent two functions, so we see the best single function is associated 
with the larger eigenvalue The optimal function associated with is 

proportional to the difference of the given functions. 

We also note that 

I j z /C z 1 l 


The decrease in the eigenvalue from the first to the second is the product integral 
of the two functions. If the functions are closely correlated one would expect sc, 
to be near unity, and would be much less than X| . But if the functions are. 

nearly uncorrelated one would expect^ to be small, and there is only a slight 
decrease in the eigenvalue, going from the larger to the smaller. Thus the rate 
of decrease of eigenvalues can be associated with the degree of correlation of 
the input functions . 
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APPENDIX C 


DESCRIPTION OF ADAPT HISTORY EXTRAPOLATION PROGRAM 

This appendix contains a mathematical description of the method used in this 
project to extrapolate a complete sunspot cycle from a partial one and a set 
of optimal vectors derived by ADAPT analysis of past cycles. 

Given T values of an input history vector which is normally N (N < T) values 
in length i. e . 



and a set of optimal data vectors NR in number, with each vector containing N 

values - - i ~ L 2. • K+7-i - , ;V 

and an assumption that the data vector XL of which is a segment is well 

represented by these optimal functions, this program will calculate the entire 
vector l 2,"' ;W) by estimating the coefficients of the vector JM' from 

the given segment of the history and the corresponding region of the optimal 
vectors. Two cases can be distinguished depending upon the value of 

t /nr 

and the mathematics for each is described below. 

Casel: t /nr 4 1 

Let H S = ( ^,j_) £=R, R + 1,....,R + T-1 

1 = 1, T 


C-l 



q S S 

Setup U* - = H y 

S 

which is T equations in T unknowns (Y ) 

g 

Now this can be solved exactly for Y 
Y 3 = (H 5 ) -1 U S 


and the history can be estimated by 




In this case the T points of the estimated history J_ will equal that 

c 

data points of the segment U . 


Case 2: . 7 nr 1 

Let H P = ( i ' ) , = R, R + 1 . . . . . , R + T - 1 

^ = 1 , NR 


Setup U S = H P Y P 

which is T equations in NR unknowns (Y^ 

Now this cannot be solved exactly. One method of solution is by least 
squares V 


Y P = (H PT H P f ‘ H PT U S 


where 


PT 

H s- transpose of Hp 


Again we can reconstruct the history 

iv, ^ 


i.C. 


L * 


■ X " / 

ir 

In this case the T points of the estimated history U which correspond to 

g 

the T points of the segment U are not exactly equal, but rather differ by 
an amount which has been minimized by the least square technique. 
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For the case T/ = 1, 
/NR 


both methods are exactly identical and therefore, 


either can be used. 


Also attached to this appendix is a copy of the FORTRAN listing of the program 
which performs the analysis just described. 
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APPENDIX D 


COMPARISON OF ESTIMATED AND ACTUAL SUNSPOT CYCLES 
FROM THE LEARNING CASES 


This appendix presents a comparison of the actual and predicted sunspot 
numbers for cycles 3 through 18 which were the learning data for developing 
the prediction algorithm discussed in this report. Figures D-l through D-16 
present the actual sunspot numbers as the dash lines. These figures present 
graphical interpretation of the RMS error of 17.4 of the learning data for 
ADAPT prediction algorithm. 

Figures D-l 7 through D-35 present the information required to make a 
similar comparison for the ADAPT third quarter extrapolation. Figures D-l 7 
and D-19 give the actual sunspot number for cycles 1 and 2 and Figures 
D-18 and D-20 through D-35 give the extrapolated sunspot numbers, based 
on the first 93 months of the cycle. 

The predictions, for which Figures D-l through D-l 6 indicate the performance 
are used for the recommended estimate of cycle 21. The third quarter extrap- 
olations, for which Figures D-17 through D-35 indicate the performance are 
used for the recommended estimate of the remainder of cycle 20. 

Comparison of the predicted and extrapolated cycles with the actual cycles 
show that the third quarter extrapolations are significantly better than the 
prediction. In both cases, it is interesting to note that the predictions of the 
location of the short term oscillation is quite good. The actual amplitude, 
and in some cases the phase of the short term oscillation is quite poor for 
the prediction algorithm. When the error in the underlying basic cycle is 
also considered, the extrapolated values for the short term oscillations are 
actually quite good. Thus, we conclude that one can use the estimate presented 
in Figure 2. 1 of the report to infer the general characteristics of the short , 
term (scale of several months) behavior for cycle 20, but only as an indication 
of when a spike might occur for the estimates of cycle 21. 

Sleeper, in Reference 13, has pointed out that cycles 4 and 9 should be con- 
sidered as models of cycle 20. Thus, a comparison of the extrapolation of 
these two cycles with their actual values may be used to anticipate the types 
of error which might be expected in the estimates of sunspot numbers presented 
in Figure 2. 1. Note that the prediction technique compared in Figures D-2 
and D-7 which significantly underpredicted both of these cycles was not used 
in the estimate of cycle 20. However, comparison of the extrapolation of cycle 
4 shown in Figure D-22 with the actual cycle presented in Figure D-2, and 
assuming the cycle ends at a sunspot number of 5 shows that the period is still 


D-l 


underpredicted by approximately 1 1/2 to 2 years and in general the sunspot 
activity for the last quarter is underpredicted from 5 to 20 sunspot numbers. 
Comparison of the extrapolation of cycle 9 shown in Figure D-27 with the 
actual values of cycle 9 shown in Figure D-7 shows that both the period and 
sunspot numbers are estimated quite well for cycle 9- Thus, one concludes 
that the estimate of the current cycle presented in Figure 2. 1 might range 
from quite good to slightly underestimating the sunspot activity and period 
of cycle 20. 
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FIGURE D-6 
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FIGURE D-ll 
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FIGURE D-13 
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FIGURE D-14 
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FIGURE D-15 
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FIGURE D- 16 
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FIGURE D-17 


ACTUAL SUNSPOT CYCLE 1 
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PREDICTED SUNSPOT CYCLE 1 
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FIGURE D-19 
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PREDICTED SUNSPOT CYCLE 5 
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FIGURE D-24 
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FIGURE D-25 


PREDICTED SUNSPOT CYCLE 7 
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FIGURE D-29 
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FIGURE D-30 

PREDICTED SUNSPOT CYCLE 12 
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PREDICTED SUNSPOT CYCLE 13 
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